pytorch_influence_functions The gradient of h

Nice repo! However, I think that the gradient of h_estimate is not cut down, which may lead to some problems.

Mar 08 '20 10:03 ZhangXiao96

Hi @ZhangXiao96, what do you mean by that? Do you have an idea how this could be fixed?

Mar 26 '20 09:03 expectopatronum

Maybe @ZhangXiao96 is talking about what is mentioned here. https://github.com/nimarb/pytorch_influence_functions/issues/5#issue-558332456

Jun 12 '20 05:06 ryokamoi

Hi! Thanks for the great code!
1、I am wondering how much GPU this code shoud take? Since I found with recursion_depth = 5000 and r=10, the GPU is growing and finally "out of memory". Is this normal? 2、I tried to cut down the gradient of h_estimate, but found the influence function is strange.

How can I fixed these problems? Thanks in advance.

Here is my experimental Details: 1、I just run the cifar experiment using the provided simple network(conv-pool-conv-fc-fc-fc-softmax) with default setting (recursion_depth = 1and r=1). --> It seems like the code can work. 2、I change the parameters to "recursion_depth = 5000 and r=10", still using batch-size =4 and 1GPU (12GB). --> The GPU increase from 500 MiB to (out of memory). 3、I find the code recursively caclulate h_estimate, and this is the source of the increasing GPU.

h_estimate = hvp(loss, list(model.parameters()), h_estimate)

After reading the issues, I think maybe "h_estimate" should not be directly use for calculate itselves the next step. Because each step, only the value of h_estimate should be used. If h_estimate is used as a "variable", I am not sure whether the hvp() function will calculate multiple order gradient? Therefore I made sevel changes like

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
h_estimate = [ _v.detach() + (1 - damp) * _h_e.detach() - _hv.detach() / scale
                    for _v, _h_e, _hv in zip(v, h_estimate, hv)]

or

h_estimate = hvp(loss, list(model.parameters()), h_estimate)
with torch.no_grad():
        h_estimate = [ _v + (1 - damp) * _h_e - _hv / scale
                  for _v, _h_e, _hv in zip(v, h_estimate, hv)]

But I found, both this two modification will increase the influence function to NAN as recursion_depth increases.

Jun 16 '20 11:06 zhongyy

Maybe @ZhangXiao96 is talking about what is mentioned here. #5 (comment)

I am not sure it is right to use the initial h_estimate to calculate the hvp() in each step. I check the tf code provided by the author (https://github.com/kohpangwei/influence-release/blob/578bc458b4d7cc39ed7343b9b271a04b60c782b1/influence/genericNeuralNet.py#L475). It seems like the h_estimate is update each step?( It is hard for me to understand tf code so I may I misunderstand. )

Jun 16 '20 11:06 zhongyy

Hi @zhongyy, I agree we should add with torch.no_grad().

However, why did you modify hvp part? I think hv = hvp(loss, list(model.parameters()), h_estimate) should not be changed.

I want to ask more about your NAN error. How can we reproduce that error? I also had received unreasonable results in some cases.

(note) This fork may be helpful. https://github.com/dedeswim/pytorch_influence_functions

Jun 16 '20 13:06 ryokamoi

@zhongyy I have the same problem like you.The h_estimate is increasing in the iteration and will be nan.So do you fix this problem?

Jun 22 '20 08:06 wangdi19941224

@zhongyy @wangdi19941224 @ryokamoi did any of you manage to fix the NaN blowing up issue? I face the same whenever I encase it with a torch.no_grad()

Jul 05 '20 23:07 iamgroot42

@iamgroot42 What kind of model did you use? It is possible to face NaN problem even if the code is correct.

One possible solution is to use a larger "scale". Taylor expansion in LiSSA originally assumes that detH <= 1. To alleviate this condition, you may use a larger scale, but more iteration would be required.

Jul 06 '20 04:07 ryokamoi

@ryokamoi it's VGG19.

I did try increasing "scale" to 500. I got rid of the NaNs (for now). Is there a good heuristic/estimate to see what lowest (or ballpark) value of 'scale' would work well?

Jul 06 '20 05:07 iamgroot42

@iamgroot42 I think there is no computationally easy way to get the lowest scale since we have to calculate detH.

Jul 06 '20 06:07 ryokamoi

Right. Thanks a lot, @ryokamoi :D

Jul 06 '20 16:07 iamgroot42

Hi everyone, have any of you managed to solve the NAN problem? I've increased the scale to a very large number, but still got NAN after about 100 iterations.?

Dec 14 '20 08:12 thongnt99

The gradient of h_estimate is not cut down.