Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

FSDP checkpointing uses deprecated APIs with PyTorch 2.2

If the newer `save` is used, the argument order seems to have changed in https://github.com/pytorch/pytorch/pull/117772 ```python /home/carlos/nightly-env/lib/python3.10/site-packages/torch/distributed/checkpoint/utils.py:409: UserWarning: The argument order of save has been changed. Please check the document...

FSDP checkpointing uses deprecated APIs with PyTorch 2.2

Technically lit-gpt doesn't rely on nightly since the 2.2 release. I opened #19463

FSDP checkpointing uses deprecated APIs with PyTorch 2.2

Also opened https://github.com/pytorch/pytorch/issues/119802 upstream. We might want to silence these after this is resolved

FSDP checkpointing uses deprecated APIs with PyTorch 2.2

https://github.com/pytorch/pytorch/issues/119800#issuecomment-1942156271 suggests that we should replace (in 2.2+) most of what we have with `{get,set}_{model,optimizer}_state_dict` functions in https://github.com/pytorch/pytorch/blob/v2.2.0/torch/distributed/checkpoint/state_dict.py

XLA Pod Request

We have support for a limited set of scripts at https://github.com/Lightning-AI/litgpt/tree/main/xla. Give it a shot, it should work with v4-32. Some info may be outdated

Gradients in GPT module of the finetuning/lora.py script are always zero

@rasbt We follow the same initialization as Microsoft's: https://github.com/microsoft/LoRA/blob/main/loralib/layers.py#L266-L271 which itself matches what you propose: https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/linear.py#L106-L109

Delete stub forward() method in LightningModule

This stub is only defined so it appears in the docs. Removing kwargs will mean this now raises ```python class MyModel(LightningModule): def forward(self, *inputs, **kwargs): return super().forward(*inputs, **kwargs) m =...

GPU memory calculator

This can be based off https://github.com/EleutherAI/cookbook/blob/main/calc/calc_transformer_mem.py https://vram.asmirnov.xyz/ This could be run at the beginning of the training script or be a separate script that you call. (from #920)

[TPU] Install torch_xla in CUDA CI

This is blocked by https://github.com/pytorch/xla/issues/4988

Add FAQ to docs

Another FAQ would be support for dynamic shapes