Corwin Joy

Results 90 comments of Corwin Joy

I did also run the linter as suggested by the contribution guide. I didn't see any errors from this addition, but it did come back pretty dirty / suggesting changes...

@szarnyasg Very cool! Does the design team have any questions or feedback? I think this would be a helpful feature for navigating the web documentation.

Hey @szarnyasg great meeting you at the con today and it was nice to chat! I thought I might follow up on this idea since I never did hear anything...

No problem, thank you for letting me know! On Fri, Aug 23, 2024, 6:32 AM Gabor Szarnyas ***@***.***> wrote: > Hi @corwinjoy I designed it with our > designers and...

Awesome @szarnyasg! Please let me know if there is anything I can do to help or collaborate.

I can confirm this issue. What happens during a checkpoint is that the optimizer param state is stored (including CPU or GPU location). But then, when lightning reloads the param...

One idea for a fix would be to add special handling based on the optimizer class, but it's a bit ugly. Replace: https://github.com/Lightning-AI/pytorch-lightning/blob/709a2a9d3b79b0a436eb2d271fbeecf8a7ba1352/src/lightning/fabric/utilities/optimizer.py#L31 With: ``` def _optimizer_to_device(optimizer: Optimizer, device: _DEVICE)...

@janeyx99 So, as I understand it, the reason for the function `_optimizer_to_device `is that after checkpointing we may need to resume on a different device. So, we may start training...

@janeyx99 Thanks! That's actually an interesting idea. I think my caveat here is that we cannot create the optimizer directly since we (generically) have only the base Optimizer class (and...

OK. Doing further testing, unfortunately, the idea of using `load_state_dict` does not work. The special logic in there merely leaves the 'step' parameter as-is if we are not using 'fused=True'....