PyTorch-Model-Compare icon indicating copy to clipboard operation
PyTorch-Model-Compare copied to clipboard

Works fine with the whole model but raise "NANs" on selected layers.

Open PengHongyiNTU opened this issue 2 years ago • 3 comments

When I was trying to compare the same model trained on different datasets, I encountered a weird problem:

It works fine when I compare all layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2')

But, when I try to compare a selected subset of layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2', model1_layers=list(model1.state_dict().keys())[:5], model2_layers=list(model2.state_dict().keys())[:5]) It raises:

HSIC computation resulted in NANs

Do you have any idea how to fix this? Thank you very much.

PengHongyiNTU avatar Sep 14 '22 20:09 PengHongyiNTU

In my case, the NaNs issue was raised when there was no hook created (here), in which case no feature would return here, resulting in the divided-by-zero error at L180. It might be helpful to see if the hooks are being generated properly.

kssteven418 avatar Oct 16 '22 21:10 kssteven418

I think an issue I am having is no matter what, in line 134 ((N - 1) * (N - 2)) ends up equalling 0 because the matrix being passes is always a size 2x2 from around line 181 K = X @ X.t().

I discovered this happens when the batch size is <= 2. So if anyone else has this issue this might be why!

I set this batch size because this method is so slow and memory consuming. Are there any tricks to it without using large batch sizes or computation?

Maddy12 avatar Nov 03 '22 03:11 Maddy12

I got the same issue: https://github.com/AntixK/PyTorch-Model-Compare/issues/10. Looking for solutions. Thanks!

bryanbocao avatar Mar 11 '23 16:03 bryanbocao