amazon-denseclus icon indicating copy to clipboard operation
amazon-denseclus copied to clipboard

UMAP hyperparameters question

Open timofeytkachenko opened this issue 1 year ago • 1 comments

We have three bunches of hyperparameters of UMAP. When should we use categorical and numerical dicts, and when combined dict?

default_umap_params = { "categorical": { # jaccard is an option but only takes sparse input "metric": "hamming", "n_neighbors": 30, "n_components": 5, "min_dist": 0.0, }, "numerical": { "metric": "l2", "n_neighbors": 30, "n_components": 5, "min_dist": 0.0, }, "combined": { "n_neighbors": 30, "min_dist": 0.0, "n_components": 5, }, }

timofeytkachenko avatar Dec 26 '23 15:12 timofeytkachenko

@timofeytkachenko Ideally, your use case won't require having to go to deep in the weeds and can use the presets. "combined" is set when it creates an intersection_union_mapper mapper like here and thus needs to fit a third UMAP over the other two. Regardless, it's fitting two using numerical and categorical. We've got another NB coming that shows these that might shed more light but please use the first one until then. FYI @bharven @srushtii-aws

momonga-ml avatar Jan 05 '24 18:01 momonga-ml