Corwin Joy

Results 90 comments of Corwin Joy

@janeyx99 I'm still a bit new to all this, but here is what I see in the stack trace when debugging a restore from checkpoint (as per the above code)....

In order to move the discussion forward, I have created a PR where this function is simply disabled to see what tests fail. It is at https://github.com/Lightning-AI/pytorch-lightning/pull/20036 Before we move...

Because the optimizer doesn't just hold a single set of parameters. Instead, it holds an array of parameters indicating model parameters that were tried. So, the optimizer restore holds an...

OK. To answer these questions I have submitted the following PR with an improved test for `_optimizer_to_device`. https://github.com/Lightning-AI/pytorch-lightning/pull/20062 1. Looking at the tests from simply disabling `_optimizer_to_device` we don't see...

@awaelchli @janeyx99 OK. I have done further investigation and added additional comments + tests to https://github.com/Lightning-AI/pytorch-lightning/pull/20062 These include specific tests for `_optimizer_to_device` behavior as well as confirming that checkpoints are...

Eventually, maybe we can rip out `_optimizer_to_device` but I would favor a more conservative approach at first.

@awaelchli - thanks so much for the improved and very nice tests! I think this helps clarify the behavior we want. Also, thanks for merging the interim fix so we...

Here is the performance information when using the test code from issue #19955 and continuing from a checkpoint. With the old code many memory synchronizations are forced, with the update...

Thanks @awaelchli . The analysis I have done on the code seems to agree that we can remove this function (see https://github.com/Lightning-AI/pytorch-lightning/issues/19955#issuecomment-2232309700). The remaining points I think are of note:...