Daddy's Technology Notes

Read, think, and write down the notes.

Thursday, June 30, 2005

Feature Selection using Canonical Analysis

Canonical Correlation is a procedure for assessing the relationship between variables. Specifically, this analysis allows us to investigate the relationship between two sets of variables.

For regression function { f(X) = AX}. Each group projects its distribution onto the lines, for each projection, we can obtain variances in the group and between groups. The idea is to minimize the variance (SSi) within the groups and maximize the variance between groups(SSb), i.e. Max SSb/SSi.

Monday, June 27, 2005

Difference among clustering, ANOVA and Classification

Clustering is related to find potential groups in a data set. Before clustering, we don't know whether the data belong to different groups or not;

ANOVA is to determine whether some measure is different between two or multiple groups of data;

Classification is to assign a new case to one of known groups according to the available data set.