ViP icon indicating copy to clipboard operation
ViP copied to clipboard

Fails on zero grad

Open lemmersj opened this issue 5 years ago • 4 comments

In instances where a neuron doesn't factor into the loss (e.g., a component of the loss is disabled for a specific experiment, resulting in a neuron or set of neurons being unused), autograd returns None for the unused connections. This results in a crash at the line:

param.grad *= 1./float(args['psuedo_batch_loop']*args['batch_size']

With the error:

TypeError: unsupported operand type(s) for *=: 'NoneType' and 'float'

This can be remedied by inserting: if param.grad is not None: prior to the line in question, but I'm unsure of any upstream consequences.

lemmersj avatar Aug 20 '19 21:08 lemmersj

That should've been fixed with issue #7 with the following line: https://github.com/MichiganCOG/ViP/blob/dev/train.py#L182.

Do you have this version from dev pulled?

natlouis avatar Aug 20 '19 22:08 natlouis

I'm using an older version (apart from pulling from master, I immediately made train.py unmergeable). My mistake for missing that issue.

lemmersj avatar Aug 21 '19 11:08 lemmersj

I came back to this --- it appears the modification in the dev branch resolves a different problem. That is, the weights that are causing and issue for me are not frozen, but have no gradient because they do not contribute to the loss.

Consider three regression nodes --- yaw, pitch, and roll. I modify training to only regress yaw by performing backpropagation on that node directly. The weights leading into the nodes for roll and pitch are left as "None" by the autograd on loss.backward(), and thus fail at the cited line.

lemmersj avatar Sep 17 '19 18:09 lemmersj

Can you post your code? Training and relevant loss and model files. A github link would work.

ehofesmann avatar Sep 20 '19 15:09 ehofesmann