phoenix icon indicating copy to clipboard operation
phoenix copied to clipboard

[BUG] UMAP points are sampled from start of dataframe rather than randomly

Open axiomofjoy opened this issue 1 year ago • 0 comments

Describe the bug UMAP points are sampled from the head of the dataframe rather than randomly. That presents an issue when the user has the rows of their dataframe grouped by class. It means that potentially only a small subset of classes are represented in the point cloud. I ran into this issue naturally when wrangling a dataset, I expect others will hit it as well.

To Reproduce Run the Colab.

Expected behavior I expect a random sample of points to be taken so that all the classes present in my dataframe are represented in the point cloud.

axiomofjoy avatar May 04 '23 05:05 axiomofjoy