Carlos Mocholí
Carlos Mocholí
When we support and test compiling fwd-bwd-step together, we might want to reimplement this as a transform. But for the current pattern used where gradient clipping happens outside of the...
Oh yes perfect. I was happy with just not erroring out because otherwise we would need to comment this out in Fabric if we want to compile forward and the...
We do guarantee this already, with the only exception of the `ModelCheckpoint` callback which gets moved to last. Although we recommend not relying on it if possible. Are you asking...
I wouldn't touch anything here. This is a feature (as already noticed) and I don't expect anybody to go ahead and set this strange flag that most won't understand why...
Hi! The `setup` that you shared in your first snippet is very different to the `setup` in https://github.com/Lightning-AI/lit-gpt/blob/main/pretrain/tinyllama.py#L66. Can you share all changes that you made to the repo? You...
My personal thoughts: I was surprised when I saw that torchtitan uses the simple and overoptimistic "academic" flops formula (https://github.com/pytorch/torchtitan/blob/main/torchtitan/utils.py#L231) considering that `torch.utils.flop_counter.FlopCounterMode` already exists (and in my experience, works...
I'd suggest `RuntimeError`
Sorry, what I meant is that I **want** to skip the lr. I dont want to expose it to the command line or config. My code will set a value...
(Leaving my thoughts written after discussing online). The cross-reduction does make sense considering what's supposed to happen with `dataloader_idx=False`: ```python add_dataloader_idx: if ``True``, appends the index of the current dataloader...
> how do we figure out for which dataloader to use the metric from while monitoring the checkpoint callback? This is an inherent limitation of the design, where `trainer.callback_metrics` is...