pytorch2lightning
pytorch2lightning copied to clipboard
Converting PyTorch 2 Lightning Examples
The repository will show you how to:
- Convert a pure
PyTorch Convolutional Neural Network Classifiertrained on MNIST to PyTorch Lightning. - Extend Pure PyTorch trivially with Lightning best practice features.
- Seamlessly scale your training in the cloud with Grid.ai - No code changes.
- Learn about Lighting Flash and its 15+ production ready tasks.
Find below PyTorch Community Voices | PyTorch Lightning | William Falcon & Thomas Chaton presenting this repository.
Bare MNIST Classifier

- PyTorch | 127 lines
- Lightning | 101 lines
Add DDP Support
- PyTorch | 184 lines
- Lightning | 102 lines: -82 lines
Add DDP Spawn Support
- PyTorch | 196 lines
- Lightning | 105 lines: -91 lines
Add Accumulated Gradients Support
- PyTorch | +198 lines
- Lightning | 106 lines: -92 lines
Add Profiling Support
- PyTorch | +226 lines
- Lightning | 106 lines: -120 lines
Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....
- PyTorch | requires a huge number of addtional lines. You
definitelydo not want to do that :tired_face: - PyTorch Lightning | Still ~ 106 lines. Let's keep it simple. :rocket:
Learn more with Lighting Docs.
PyTorch Lightning 1.4 is out ! Here is our CHANGELOG.
Don't forget to :star: PyTorch Lightning.
Training on Grid.ai
Grid.ai is a ML Platform from the creators of PyTorch Lightning that enables you to train Machine Learning code without worrying about infrastructure.
Learn more with Grid.ai Docs
1. Install Lightning-Grid
pip install lightning-grid --upgrade
2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES
grid run --instance_type 4_M60_8gb ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp
With Grid DataStores, low-latency, highly-scalable auto-versioned dataset.
grid datastore create --name mnist --source data
grid run --instance_type 4_M60_8gb --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp
Pure PyTorch:
grid datastore create --name mnist --source data
grid run --instance_type g4dn.xlarge --gpus 2 ddp_mnist_grid/boring_pytorch.py
Add --use_spot to use interruptible machines.
Grid.ai makes scaling multi node training easy :rocket: Train on 2+ nodes with 4 GPUS using DDP Sharded :fire:
grid run --instance_type 4_M60_8gb --gpus 8 --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.num_nodes 2 --trainer.gpus 4 --trainer.accelerator ddp_sharded
Train Andrej Karpathy minGPT converted to PyTorch Lightning by @williamFalcon and bencharmked with DeepSpeed by @SeanNaren
git clone https://github.com/SeanNaren/minGPT.git
git checkout benchmark
grid run --instance_type g4dn.12xlarge --gpus 8 benchmark.py --n_layer 6 --n_head 16 --n_embd 2048 --gpus 4 --num_nodes 2 --precision 16 --batch_size 32 --plugins deepspeed_stage_3
Learn how to scale your scripts with PyTorch Lighting + DeepSpeed
Lighting Flash.
Lighting Flash is collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning built on top of PyTorch Lightning.
Train a PyTorchVideo Classifier with Lighting Flash. Check out Grid.ai reproducible button:
import os
import flash
from flash.core.data.utils import download_data
from flash.video import VideoClassificationData, VideoClassifier
# 1. Create the DataModule
# Find more datasets at https://pytorchvideo.readthedocs.io/en/latest/data.html
download_data("https://pl-flash-data.s3.amazonaws.com/kinetics.zip", "./data")
datamodule = VideoClassificationData.from_folders(
train_folder=os.path.join(os.getcwd(), "data/kinetics/train"),
val_folder=os.path.join(os.getcwd(), "data/kinetics/val"),
clip_sampler="uniform",
clip_duration=1,
decode_audio=False,
)
# 2. Build the task
model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes, pretrained=False)
# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")
# 4. Make a prediction
predictions = model.predict(os.path.join(os.getcwd(), "data/kinetics/predict"))
print(predictions)
# 5. Save the model!
trainer.save_checkpoint("video_classification.pt")
Credits
Credit to PyTorch Team for providing the Bare Mnist example.
Credit to Andrej Karpathy for providing an implementation of minGPT.
Troubleshooting
Kill ddp processes
sudo kill -9 $(ps -aef | grep -i 'ddp' | grep -v 'grep' | awk '{ print $2 }')
