Alexander Zhipa
Alexander Zhipa
### Bug description Running `LearningRateFinder` leads to `teardown()` on training epoch loop's results being moved to "cpu" [here](https://github.com/Lightning-AI/pytorch-lightning/blob/master/src/lightning/pytorch/loops/training_epoch_loop.py#L314). The problem is that loop results are only moved to device when...
### Description & Motivation It's not clear why it's currently disabled [here](https://github.com/Lightning-AI/pytorch-lightning/blob/c235f20e7131af2c7be4cc9080d3c946d93d58ea/src/lightning/pytorch/callbacks/batch_size_finder.py#L137). ### Pitch There should not be a big difference in how it works vs. LR finder. E.g. all...
*Issue #, if available:* N/A *Description of changes:* Adds PyTorch Lightning support, `LightningModule` and `LightningDataModule` for Node GNN model and also a Jupyter nodebook demonstrating how it works. By submitting...
Adding python implementation, fixing a minor grammatical error.
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occasions. > An outdated branch is never blocking the merge of a PR. > Please reach...
Make `ChainedOptimizer` honor `log_num_zeros_in_grad` to keep the behavior consistent in case we silently end up using it, e.g. when using EP>1.
> [!IMPORTANT] > The `Update branch` button must only be pressed in very rare occasions. > An outdated branch is never blocking the merge of a PR. > Please reach...
Adding python implementation, fixing a minor grammatical error. Recreating https://github.com/cp-algorithms/cp-algorithms/pull/1336 against the new base (`main`).
A simple merge for the list of all registered schedulers. Test plan: [x] all existing tests should pass
## Description The default timeout for c10d rdzv is 60 seconds. The more nodes are used to run a job - the more unlikely it is to get them aligned...