rlpyt
rlpyt copied to clipboard
Why is .item() not called on grad norm like on other opt info fields?
In, e.g., PPO (though this also applies to at least A2C; I didn't check any others), when OptInfo is being populated, .item
is called on most of the fields but not on grad norm (see here). Is this on purpose? It makes it harder to log opt info when the types aren't consistent between the different fields (e.g. I was getting errors from trying to do np.mean
because it fails on a list of torch.Tensors
if they're on the GPU). I'm happy to submit a PR to change this for whatever algorithms have grad norm in their opt info, but I wanted to check first whether this was intentional and if so why.
Hi! This is because in PyTorch 1.2, the grad norm is returned as a python float. In later versions of PyTorch, it's a pytorch object which requires calling .item()
to retrieve the value.
And yes, for logging purposes, whenever there is a torch tensor that may be on the GPU, it should be logged as x.detach().cpu().numpy()
within the algorithm. Or, you could modify the runner class to log torch tensors differently. :)
Does this answer your question?