Converting PyTorch 2 Lightning Examples

The repository will show you how to:

Convert a pure PyTorch Convolutional Neural Network Classifier trained on MNIST to PyTorch Lightning.
Extend Pure PyTorch trivially with Lightning best practice features.
Seamlessly scale your training in the cloud with Grid.ai - No code changes.
Learn about Lighting Flash and its 15+ production ready tasks.

Find below PyTorch Community Voices | PyTorch Lightning | William Falcon & Thomas Chaton presenting this repository.

Bare MNIST Classifier

Minst Dataset

PyTorch | 127 lines
Lightning | 101 lines

Add DDP Support

PyTorch | 184 lines
Lightning | 102 lines: -82 lines

Add DDP Spawn Support

PyTorch | 196 lines
Lightning | 105 lines: -91 lines

Add Accumulated Gradients Support

PyTorch | +198 lines
Lightning | 106 lines: -92 lines

Add Profiling Support

PyTorch | +226 lines
Lightning | 106 lines: -120 lines

Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....

PyTorch | requires a huge number of addtional lines. You definitely do not want to do that :tired_face:
PyTorch Lightning | Still ~ 106 lines. Let's keep it simple. :rocket:

Learn more with Lighting Docs.

PyTorch Lightning 1.4 is out ! Here is our CHANGELOG.

Don't forget to :star: PyTorch Lightning.

Training on Grid.ai

Grid.ai is a ML Platform from the creators of PyTorch Lightning that enables you to train Machine Learning code without worrying about infrastructure.

Learn more with Grid.ai Docs

1. Install Lightning-Grid

pip install lightning-grid --upgrade

2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES

grid run --instance_type 4_M60_8gb ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

With Grid DataStores, low-latency, highly-scalable auto-versioned dataset.

grid datastore create --name mnist --source data
grid run --instance_type 4_M60_8gb --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

Pure PyTorch:

grid datastore create --name mnist --source data
grid run --instance_type g4dn.xlarge --gpus 2 ddp_mnist_grid/boring_pytorch.py

Add --use_spot to use interruptible machines.

Grid.ai makes scaling multi node training easy :rocket: Train on 2+ nodes with 4 GPUS using DDP Sharded :fire:

grid run --instance_type 4_M60_8gb --gpus 8 --datastore_name mnist --datastore_mount_dir data  ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.num_nodes 2 --trainer.gpus 4 --trainer.accelerator ddp_sharded

Train Andrej Karpathy minGPT converted to PyTorch Lightning by @williamFalcon and bencharmked with DeepSpeed by @SeanNaren

git clone https://github.com/SeanNaren/minGPT.git
git checkout benchmark
grid run --instance_type g4dn.12xlarge --gpus 8 benchmark.py --n_layer 6 --n_head 16 --n_embd 2048 --gpus 4 --num_nodes 2 --precision 16 --batch_size 32 --plugins deepspeed_stage_3

Learn how to scale your scripts with PyTorch Lighting + DeepSpeed

Lighting Flash.

Lighting Flash is collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning built on top of PyTorch Lightning.

Train a PyTorchVideo Classifier with Lighting Flash. Check out Grid.ai reproducible button:

import os

import flash
from flash.core.data.utils import download_data
from flash.video import VideoClassificationData, VideoClassifier

# 1. Create the DataModule
# Find more datasets at https://pytorchvideo.readthedocs.io/en/latest/data.html
download_data("https://pl-flash-data.s3.amazonaws.com/kinetics.zip", "./data")

datamodule = VideoClassificationData.from_folders(
    train_folder=os.path.join(os.getcwd(), "data/kinetics/train"),
    val_folder=os.path.join(os.getcwd(), "data/kinetics/val"),
    clip_sampler="uniform",
    clip_duration=1,
    decode_audio=False,
)

# 2. Build the task
model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes, pretrained=False)

# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# 4. Make a prediction
predictions = model.predict(os.path.join(os.getcwd(), "data/kinetics/predict"))
print(predictions)

# 5. Save the model!
trainer.save_checkpoint("video_classification.pt")

Credits

Credit to PyTorch Team for providing the Bare Mnist example.

Credit to Andrej Karpathy for providing an implementation of minGPT.

Troubleshooting

Kill ddp processes

sudo kill -9 $(ps -aef | grep -i 'ddp' | grep -v 'grep' | awk '{ print $2 }')

pytorch2lightning
pytorch2lightning copied to clipboard

Metadata

Converting PyTorch 2 Lightning Examples

Bare MNIST Classifier

Add DDP Support

Add DDP Spawn Support

Add Accumulated Gradients Support

Add Profiling Support

Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....

Training on Grid.ai

1. Install Lightning-Grid

2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES

Lighting Flash.

Credits

Troubleshooting

← Metadata

Owner

Metadata

pytorch2lightning pytorch2lightning copied to clipboard

Metadata

Converting PyTorch 2 Lightning Examples

Bare MNIST Classifier

Add DDP Support

Add DDP Spawn Support

Add Accumulated Gradients Support

Add Profiling Support

Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....

Training on Grid.ai

1. Install Lightning-Grid

2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES

Lighting Flash.

Credits

Troubleshooting

← Metadata

Owner

Metadata

pytorch2lightning
pytorch2lightning copied to clipboard