pca icon indicating copy to clipboard operation
pca copied to clipboard

Defining Necessary Number of Dimensions

Open BradKML opened this issue 1 year ago • 1 comments

For fun I also borrowed some other data from This Link and see how personality and test performance can be condensed to a dimensionally reduced model. personality_score.csv

Question1 : what is the proper way of selecting sufficient amont of dimensions to preserve data and avoiding noise? Kaiser–Meyer–Olkin, Levene, and others all seem to be better descriptors compared to "Eigenvalue > 1" rule. Question 2: Can PCA be integrated with something else such that it can behave like PCR and Lasso Regression? (as in reducing the amounf of unnecessary columns before attempting to be accurate) Questions 3: Can ICA be used to discover significant columns? It is seen as A way to isolate components after using PCA to assess proper dimension count

!pip install pca
from pandas import read_csv
from pca import pca

df = read_csv('https://files.catbox.moe/4nztka.csv')
df = df.drop(columns=df.columns[0], axis=1)
y = df[['AFQT']]
X = df.drop(columns=['AFQT'])
model = pca(normalize=True)
results = model.fit_transform(X)
print(model.results['explained_var'])
fig, ax = model.plot()
fig.savefig('personality_performance.png')

personality_performance.png

BradKML avatar Oct 25 '22 08:10 BradKML