umap
umap copied to clipboard
fix relations_dictionary problems which prevents from correctly updating aligned_umap
Fixes AlignedUMAP.update
which does not work correctly.
The relations dictionary is updated with the inverse of the provided relations dictionary. However inverse dictionary should be used only in init_from_existing
while generating new_embedding
.
The problem is not revealed when the number of entities over updates are equal and the inverse relation is equal to the relation itself. However when the number of entities increase over updates and the inverse relation is not equal to relation itself, the updates do not create a correct map.
Here is an example notebook to see the how Aligned UMAP works over 10 years data where the number of entities increase over time. If 10 years' data is provided to AlignedUMAP.fit
at once, it creates correct maps. However if AlignedUMAP.fit
is used with partial data and the rest is feeded with AlignedUMAP.update
, updates give incorrect mappings.
I have some other suggestions but I would like to ask your comments before proceeding:
-
fit
function signature has**fit_params
however it only usesrelations
andwindow_size
among these parameters. It gives the wrong impression that we can set fit parameters. I propose to keepfit
function simple as inUMAP.fit
and only receiveX
,y
andrelations
and usewindow_size
fromself.alignment_window_size
-
update
function does not initialize new mapper with the parameters used upon initialization infit
function. For example initial mappings might use ametric
different from default one, but that metric will not be used in the updating mappers. I can changeupdate
function to initialize mapper using the same parameters infit
function. - are all values in
PARAM_NAMES
relevant for update? For example can we changemetric
for update? - default
n_epochs
value used inmapper
andfit
/update
are not equal. Does that make any difference. Why is self.n_epochs value not updated with default value? If there is no specific reason, the value could be set in mapper and aligned_umap could use the value from mapper.
If you agree with the proposed changes, I can add them to this PR.
Thanks for this. Regarding the further questions:
- The
**fit_params
is following in line with the latest version of the sklearn API, and is now standard for sklearn models. Eventually the base UMAP models should allow for the same (for, e.g. pipeline purposes). - This makes good sense -- let's keep things the same on updates.
- In principle, yes, you could change metric on update. The goal was to be equivalent to what you might get from fitting fitting a model over varying parameters, and, while it may be odd, it is not entirely unreasonable to align some models based on using different metrics for each of the models.
- The
n_epochs
should probably match fit, yes.
If you want to add the changes (keep things the same on updates, make n_epochs
match) to this PR that would be great.
Thank you for the answers!
* The `**fit_params` is following in line with the latest version of the sklearn API, and is now standard for sklearn models. Eventually the base UMAP models should allow for the same (for, e.g. pipeline purposes).
Ok. But then it would be good to update the values in fit
function. I have not changed anything on this part. I thought of using set_aligned_params
to update the values, however it appends the values to existing ones. Probably this should overwrite the existing ones.
* This makes good sense -- let's keep things the same on updates.
This is done
* In principle, yes, you could change metric on update. The goal was to be equivalent to what you might get from fitting fitting a model over varying parameters, and, while it may be odd, it is not entirely unreasonable to align some models based on using different metrics for each of the models.
Ok, doing nothing on that.
* The `n_epochs` should probably match fit, yes.
I thought of how to change this, but it is not that straight forward. I do not have much time to spend on that now. If I find some time, I would make it in another PR.
If you want to add the changes (keep things the same on updates, make
n_epochs
match) to this PR that would be great.
Hi @lmcinnes ,
I had sometime and I came back to this PR.
Previously I had fixed two problems in the update
function:
-
update
function was adding the inverse of therelations
dict to the list of relations, I've updated to use inverse dict when necessary; ie. passed toinit_from_existing
- the mapper was not initialized with the specified parameters. I've updated to use them.
I've made some additional changes.
- in
fit
function, I've addedinit
parameter while initializing the mapper. - Both in
fit
andupdate
n_epochs
is set after initializing the mapper and not set as class variable. I'm settingn_epochs
before initializing the mapper, thus the mapper is initialized with the samen_epochs
value used in aligned_umap. - set
window_size
inupdate
and use it inexpand_relations
- I've added type checking for
relations
andn_components
inupdate
I think this fixes all the issues that we had discussed above.
Sorry for being slow; I'm trying to loop back to this now, and I'll try to review it soon.