Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

[FSDP] Support gradient clipping by norm

When we support and test compiling fwd-bwd-step together, we might want to reimplement this as a transform. But for the current pattern used where gradient clipping happens outside of the...

Support `torch.Tensor.register_hook`

Oh yes perfect. I was happy with just not erroring out because otherwise we would need to comment this out in Fabric if we want to compile forward and the...

Guarantee call order for callbacks

We do guarantee this already, with the only exception of the `ModelCheckpoint` callback which gets moved to last. Although we recommend not relying on it if possible. Are you asking...

Mixing the order of `--config` and `fit` in LightningCLI can cause confusion

I wouldn't touch anything here. This is a feature (as already noticed) and I don't expect anybody to go ahead and set this strange flag that most won't understand why...

Weird error when using activation checkpointing for FSDPStrategy

Hi! The `setup` that you shared in your first snippet is very different to the `setup` in https://github.com/Lightning-AI/lit-gpt/blob/main/pretrain/tinyllama.py#L66. Can you share all changes that you made to the repo? You...

Adjust MFU to account for FP8

My personal thoughts: I was surprised when I saw that torchtitan uses the simple and overoptimistic "academic" flops formula (https://github.com/pytorch/torchtitan/blob/main/torchtitan/utils.py#L231) considering that `torch.utils.flop_counter.FlopCounterMode` already exists (and in my experience, works...

Carlos Mocholí