scikit-lego icon indicating copy to clipboard operation
scikit-lego copied to clipboard

Added usage examples

Open anopsy opened this issue 1 year ago • 4 comments

Docs

Added usage examples to: decomposition.umap_reconstruction.UMAPOutlierDetection decomposition.pca_reconstruction.PCAOutlierDetection

Fixes #652 and #653

Type of change

  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds functionality)
  • [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

anopsy avatar Apr 27 '24 13:04 anopsy

image Wondering what happend here

anopsy avatar Apr 27 '24 13:04 anopsy

Oh gee, there are even more failed tests now.😮

About examples I copied the approach with the arrays from scikit-learn docs on PCA (check ss) image

and I tinkered around with the values to check how the detectors work for different values in the arrays, n_components and thresholds. I tried to add values that would be clearly an outlier because of the quantiles for example [-100, 99, -99] but the PCA/UMAPOutlierDetectors "classified" them as inliers. I also saw in User Guide that the values classified by PCA/UMAPOutlierDetectors as outliers, don't look like quantiles based outliers -so you can't spot them just by looking. If that makes any sense.

image

anopsy avatar Apr 28 '24 10:04 anopsy

Yep the doc page is using iris dataset, which I would not expect to have any particular outlier. We have one obvious example in the test suite.

@koaning thoughts on this? In my opinion, it could be worth it to change dataset in the user guide as well. It seems a bit confusing

FBruzzesi avatar Apr 29 '24 13:04 FBruzzesi

Sure, I'll do it the way it's done in the test suite.

anopsy avatar Apr 30 '24 07:04 anopsy

Yep the doc page is using iris dataset, which I would not expect to have any particular outlier. We have one obvious example in the test suite.

Hey Francesco I'm back at it. I was having some thoughts about the examples for PCA and UMAPs, what bugs me is that if I use the obvious example which is a 10-d array, how can I show the resulting outliers? Should I print the 10d output? I mean that the numbers in the arrays I used may seem arbitrary, but at least we can show the outliers in a simple one line output. Let me know wdyt

anopsy avatar Jul 17 '24 14:07 anopsy