rlpyt icon indicating copy to clipboard operation
rlpyt copied to clipboard

Why is .item() not called on grad norm like on other opt info fields?

Open neighthan opened this issue 4 years ago • 1 comments

In, e.g., PPO (though this also applies to at least A2C; I didn't check any others), when OptInfo is being populated, .item is called on most of the fields but not on grad norm (see here). Is this on purpose? It makes it harder to log opt info when the types aren't consistent between the different fields (e.g. I was getting errors from trying to do np.mean because it fails on a list of torch.Tensors if they're on the GPU). I'm happy to submit a PR to change this for whatever algorithms have grad norm in their opt info, but I wanted to check first whether this was intentional and if so why.

neighthan avatar May 25 '20 19:05 neighthan

Hi! This is because in PyTorch 1.2, the grad norm is returned as a python float. In later versions of PyTorch, it's a pytorch object which requires calling .item() to retrieve the value.

And yes, for logging purposes, whenever there is a torch tensor that may be on the GPU, it should be logged as x.detach().cpu().numpy() within the algorithm. Or, you could modify the runner class to log torch tensors differently. :)

Does this answer your question?

astooke avatar Jun 30 '20 16:06 astooke