Carlos Mocholí comments

Results 427 comments of


                                            Carlos Mocholí

LightningModule self.log add_dataloader_idx doesn't reduce properly the metric across dataloaders

If we support cross-reduction, then using the same key for multiple dataloaders is not an error but a feature, as it would be the mechanism to do it.

LightningModule self.log add_dataloader_idx doesn't reduce properly the metric across dataloaders

If the 10 dataloaders use the same key and we support (3), the process would be: 1. Wait for training end 1. Take the average over the 10 values (by...

Support combinations of precision plugins

Not for now: https://github.com/NVIDIA/TransformerEngine/issues/401

Support combinations of precision plugins

TransformerEngine, and then we would need to integrate whatever is changed into Lightning. cc @sbhavani in case you know about the progress for this

Support combinations of precision plugins

@sbhavani I see https://github.com/NVIDIA/TransformerEngine/tree/main/examples/pytorch/fsdp exists now. Is your last comment still valid?

profile with modules and stack

So do you want that I add a job config argument for `with_stack` only? Or for both?

adding DDP/FSDP transform after JITting does not work

I can work around this by setting ```python tmodel._lc_cd.process_group_for_ddp = tmodel._lc_cd.fn.process_group_for_ddp ``` since `thunder` gets this information at `jit()` time: https://github.com/Lightning-AI/lightning-thunder/blob/94c94948b79875ba5247b5c986afa088d970a49d/thunder/common.py#L224-L226 So my question is: could we delay accessing this...