kaggle-yelp-restaurant-photo-classification icon indicating copy to clipboard operation
kaggle-yelp-restaurant-photo-classification copied to clipboard

Clarification of feature extraction and processing

Open CP121 opened this issue 7 years ago • 1 comments

I would be very grateful if you could clarify your feature extraction workflow as much of the file compress.py is commented out. Is the following correct?

  1. extract features from the training data using n ImageNet pretrained models

  2. normalise each feature matrix separately using sklearn.preprocessing.normalize

  3. concatenate all feature matrices horizontally (axis=1)

  4. calculate the column mean 'm' (axis=0) and subtract

  5. Apply svd using sklearn.decomposition.TruncatedSVD with n_components=64 and algorithm='arpack'

  6. repeat for the test data: normalise, concatenate, subtract the mean 'm' (calculated from the training data in step 3), transform using svd (which was fit on the training data in step 4).

Thank you.

CP121 avatar Apr 18 '17 04:04 CP121

Sorry for the late reply. Thanks for your interest. In general you are right, but I've used 7 different feature sets, described here, and for the some of feature sets some steps was skipped.
4. - 5. is essentially PCA. Features from the model trained on the full ImageNet gave me main boost in accuracy. If you any questions please feel free to ask me.

u1234x1234 avatar Apr 22 '17 13:04 u1234x1234