umap icon indicating copy to clipboard operation
umap copied to clipboard

fix relations_dictionary problems which prevents from correctly updating aligned_umap

Open hndgzkn opened this issue 2 years ago • 2 comments

Fixes AlignedUMAP.update which does not work correctly.

The relations dictionary is updated with the inverse of the provided relations dictionary. However inverse dictionary should be used only in init_from_existing while generating new_embedding.

The problem is not revealed when the number of entities over updates are equal and the inverse relation is equal to the relation itself. However when the number of entities increase over updates and the inverse relation is not equal to relation itself, the updates do not create a correct map.

Here is an example notebook to see the how Aligned UMAP works over 10 years data where the number of entities increase over time. If 10 years' data is provided to AlignedUMAP.fitat once, it creates correct maps. However if AlignedUMAP.fitis used with partial data and the rest is feeded with AlignedUMAP.update, updates give incorrect mappings.

I have some other suggestions but I would like to ask your comments before proceeding:

  • fit function signature has **fit_params however it only uses relations and window_size among these parameters. It gives the wrong impression that we can set fit parameters. I propose to keep fit function simple as in UMAP.fit and only receive X, y and relations and use window_size from self.alignment_window_size
  • update function does not initialize new mapper with the parameters used upon initialization in fit function. For example initial mappings might use a metric different from default one, but that metric will not be used in the updating mappers. I can change update function to initialize mapper using the same parameters in fit function.
  • are all values in PARAM_NAMES relevant for update? For example can we change metric for update?
  • default n_epochs value used in mapper and fit/update are not equal. Does that make any difference. Why is self.n_epochs value not updated with default value? If there is no specific reason, the value could be set in mapper and aligned_umap could use the value from mapper.

If you agree with the proposed changes, I can add them to this PR.

hndgzkn avatar May 16 '22 16:05 hndgzkn

Thanks for this. Regarding the further questions:

  • The **fit_params is following in line with the latest version of the sklearn API, and is now standard for sklearn models. Eventually the base UMAP models should allow for the same (for, e.g. pipeline purposes).
  • This makes good sense -- let's keep things the same on updates.
  • In principle, yes, you could change metric on update. The goal was to be equivalent to what you might get from fitting fitting a model over varying parameters, and, while it may be odd, it is not entirely unreasonable to align some models based on using different metrics for each of the models.
  • The n_epochs should probably match fit, yes.

If you want to add the changes (keep things the same on updates, make n_epochs match) to this PR that would be great.

lmcinnes avatar May 19 '22 15:05 lmcinnes

Thank you for the answers!

* The `**fit_params` is following in line with the latest version of the sklearn API, and is now standard for sklearn models. Eventually the base UMAP models should allow for the same (for, e.g. pipeline purposes).

Ok. But then it would be good to update the values in fit function. I have not changed anything on this part. I thought of using set_aligned_params to update the values, however it appends the values to existing ones. Probably this should overwrite the existing ones.

* This makes good sense -- let's keep things the same on updates.

This is done

* In principle, yes, you could change metric on update. The goal was to be equivalent to what you might get from fitting fitting a model over varying parameters, and, while it may be odd, it is not entirely unreasonable to align some models based on using different metrics for each of the models.

Ok, doing nothing on that.

* The `n_epochs` should probably match fit, yes.

I thought of how to change this, but it is not that straight forward. I do not have much time to spend on that now. If I find some time, I would make it in another PR.

If you want to add the changes (keep things the same on updates, make n_epochs match) to this PR that would be great.

hndgzkn avatar May 20 '22 15:05 hndgzkn

Hi @lmcinnes ,

I had sometime and I came back to this PR.

Previously I had fixed two problems in the update function:

  • update function was adding the inverse of the relations dict to the list of relations, I've updated to use inverse dict when necessary; ie. passed to init_from_existing
  • the mapper was not initialized with the specified parameters. I've updated to use them.

I've made some additional changes.

  • in fit function, I've added init parameter while initializing the mapper.
  • Both in fit and update n_epochs is set after initializing the mapper and not set as class variable. I'm setting n_epochs before initializing the mapper, thus the mapper is initialized with the same n_epochs value used in aligned_umap.
  • set window_size in update and use it in expand_relations
  • I've added type checking for relations and n_components in update

I think this fixes all the issues that we had discussed above.

hndgzkn avatar Apr 20 '23 13:04 hndgzkn

Sorry for being slow; I'm trying to loop back to this now, and I'll try to review it soon.

lmcinnes avatar Aug 04 '23 18:08 lmcinnes