pfrl icon indicating copy to clipboard operation
pfrl copied to clipboard

Modifies loss storage in DDPG, TD3, and Soft Actor Critic

Open prabhatnagarajan opened this issue 1 year ago • 6 comments

As per this discussion:

https://github.com/pfnet/pfrl/pull/195#issuecomment-2082102663

I have switched many of the variable.detach().cpu().numpy() with variable.item() in our DDPG-TD3-SAC family of agents.

I have ran the training scripts for these agents for a bit and checked the scores.txt and the values are indeed being written to scores.txt without issues.

prabhatnagarajan avatar Apr 13 '24 00:04 prabhatnagarajan

Friendly reminder on this :)

prabhatnagarajan avatar Apr 29 '24 06:04 prabhatnagarajan

The changes will probably not break anything, but can you elaborate on why you suggest them? If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

muupan avatar Apr 29 '24 07:04 muupan

I think to justify it I need more data. But I did it mainly for consistency. But I think it may require less memory in some instances (at least some informal experiments showed slight improvements in memory usage though I can't be sure without a more complete experiment).

If it is just for consistency, I think it is better to force recorded losses to be floats, not scalar ndarrays, for simplicity.

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Do you remember why we did detach().cpu().numpy() originally?

prabhatnagarajan avatar Apr 29 '24 07:04 prabhatnagarajan

To be clear, we do indeed force them to be floats, right? We simply do loss.detach().cpu().numpy() before casting them to floats.

Ah sorry I missed float around it. Right, both are eventually casting losses to floats. So I think they should function in the same way. Directly casting to float could be more efficient as it could skip making a numpy.ndarray, but I don't know the implementation details of Torch.__float__.

Do you remember why we did detach().cpu().numpy() originally?

IIRC I was just unaware it is possible to directly cast torch.Tensor to float. After I learned it is safe to cast (https://github.com/pytorch/pytorch/issues/1129) I switched.

I think it is better to choose the simpler way unless there is a memory leak issue in it. There seems to be yet another way of doing this, Tensor.item(), perhaps now recommended by pytorch: https://pytorch.org/docs/stable/tensors.html#initializing-and-basic-operations

Use torch.Tensor.item() to get a Python number from a tensor containing a single value:

muupan avatar Apr 29 '24 08:04 muupan

Oh, thanks!

Yes, tensor.item() actually seems quite like what we want. The documentation also states that: This operation is not differentiable.

This tells me that it is implicitly detaching from the computation graph (which is what we apparently we do when we detach()).

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

prabhatnagarajan avatar Apr 29 '24 08:04 prabhatnagarajan

Perhaps I can replace instances of detach().cpu().numpy() with tensor.item()? And check the tests and maybe some runs and so forth?

Sounds good. I think it is ok to write like self.q_func1_loss_record.append(loss1.item()) as torch.zeros(1, dtype=torch.float32).item() is already float.

muupan avatar Apr 29 '24 08:04 muupan

/test

muupan avatar Aug 04 '24 14:08 muupan

Successfully created a job for commit 41c0e92:

pfn-ci-bot avatar Aug 04 '24 14:08 pfn-ci-bot