linfa icon indicating copy to clipboard operation
linfa copied to clipboard

Improvements for Principal Component Analysis

Open bytesnake opened this issue 5 years ago • 3 comments

A plain Principal Component Analysis algorithm was added in https://github.com/rust-ml/linfa/commit/7b6075e2dc9cc1c56ad7cd956bf996d69ce51d20. The next steps should improve upon edge-cases and features.

  • [ ] implement Roweis Discriminant Analysis which mixes supervised and unsupervised models
  • [ ] implement sparse PCA. By adding a sparsity constraint (like LASSO) only certain principal components are selected to represent the data
  • [ ] implement robust PCA to improve robustness to outliers by using a L1 norm instead of the normal Frobenius norm
  • [ ] (?) implement non-linear PCA (should be similar to diffusion maps except for scaling)
  • [ ] add tests for edge-cases for very large, sparse or ill-behaving datasets

bytesnake avatar Jul 14 '20 06:07 bytesnake

sparse PCA depends on #46

bytesnake avatar Nov 21 '20 08:11 bytesnake

It seems like some Sparse/Robust PCA tests use Yale face dataset. It might be too big for linfa-datasets though (as it is a lot of image data), but I think it's way too nice of a "real-world" example to let this one slip away. Maybe we could have a separate repository for this test?

I am not sure about licensing though, the page I linked above does not mention it and the link they give to the original dataset seems broken.

EDIT: Another page about the original dataset says:

NOTE: You are free to use the Yale Face Database B for research purposes. If experimental results are obtained that use images from within the database, all publications of these results should acknowledge the use of the "Yale Face Database B" and reference this paper. Without permission from Yale University, images from within the database cannot be incorporated into a larger database which is then publicly distributed.

sjaustirni avatar Jul 07 '21 14:07 sjaustirni

Perhaps, you can take a look at the mnist crate and implement something similar for the face dataset? There is also an open issue here to replace the downloader: https://github.com/davidMcneil/mnist/pull/8

bytesnake avatar Jul 18 '21 09:07 bytesnake