pytorch2lightning icon indicating copy to clipboard operation
pytorch2lightning copied to clipboard

Converting PyTorch 2 Lightning Examples

The repository will show you how to:

  • Convert a pure PyTorch Convolutional Neural Network Classifier trained on MNIST to PyTorch Lightning.
  • Extend Pure PyTorch trivially with Lightning best practice features.
  • Seamlessly scale your training in the cloud with Grid.ai - No code changes.
  • Learn about Lighting Flash and its 15+ production ready tasks.

Find below PyTorch Community Voices | PyTorch Lightning | William Falcon & Thomas Chaton presenting this repository.

Alt text

Bare MNIST Classifier

Minst Dataset

  • PyTorch | 127 lines
  • Lightning | 101 lines

Add DDP Support

  • PyTorch | 184 lines
  • Lightning | 102 lines: -82 lines

Add DDP Spawn Support

  • PyTorch | 196 lines
  • Lightning | 105 lines: -91 lines

Add Accumulated Gradients Support

  • PyTorch | +198 lines
  • Lightning | 106 lines: -92 lines

Add Profiling Support

  • PyTorch | +226 lines
  • Lightning | 106 lines: -120 lines

Add DeepSpeed, FSDP, Multiple Loggers, Mutliple Profilers, TorchScript, Loop Customization, Fault Tolerant Training, etc ....

  • PyTorch | requires a huge number of addtional lines. You definitely do not want to do that :tired_face:
  • PyTorch Lightning | Still ~ 106 lines. Let's keep it simple. :rocket:

Learn more with Lighting Docs.

PyTorch Lightning 1.4 is out ! Here is our CHANGELOG.

Don't forget to :star: PyTorch Lightning.

Training on Grid.ai

Grid.ai is a ML Platform from the creators of PyTorch Lightning that enables you to train Machine Learning code without worrying about infrastructure.

Learn more with Grid.ai Docs

1. Install Lightning-Grid

pip install lightning-grid --upgrade

2. SEAMLESSLY TRAIN 100s OF MACHINE LEARNING MODELS ON THE CLOUD FROM YOUR LAPTOP - NO CODE CHANGES

grid run --instance_type 4_M60_8gb ddp_mnist_grid/lightning.py --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

With Grid DataStores, low-latency, highly-scalable auto-versioned dataset.

grid datastore create --name mnist --source data
grid run --instance_type 4_M60_8gb --datastore_name mnist --datastore_mount_dir data ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.gpus 4 --trainer.accelerator ddp

Pure PyTorch:

grid datastore create --name mnist --source data
grid run --instance_type g4dn.xlarge --gpus 2 ddp_mnist_grid/boring_pytorch.py

Add --use_spot to use interruptible machines.

Grid.ai makes scaling multi node training easy :rocket: Train on 2+ nodes with 4 GPUS using DDP Sharded :fire:

grid run --instance_type 4_M60_8gb --gpus 8 --datastore_name mnist --datastore_mount_dir data  ddp_mnist_grid/lightning.py  --trainer.max_epochs 2 --trainer.num_nodes 2 --trainer.gpus 4 --trainer.accelerator ddp_sharded

Train Andrej Karpathy minGPT converted to PyTorch Lightning by @williamFalcon and bencharmked with DeepSpeed by @SeanNaren

git clone https://github.com/SeanNaren/minGPT.git
git checkout benchmark
grid run --instance_type g4dn.12xlarge --gpus 8 benchmark.py --n_layer 6 --n_head 16 --n_embd 2048 --gpus 4 --num_nodes 2 --precision 16 --batch_size 32 --plugins deepspeed_stage_3

Learn how to scale your scripts with PyTorch Lighting + DeepSpeed

Lighting Flash.

Lighting Flash is collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning built on top of PyTorch Lightning.

Train a PyTorchVideo Classifier with Lighting Flash. Check out Grid.ai reproducible button: Grid

import os

import flash
from flash.core.data.utils import download_data
from flash.video import VideoClassificationData, VideoClassifier

# 1. Create the DataModule
# Find more datasets at https://pytorchvideo.readthedocs.io/en/latest/data.html
download_data("https://pl-flash-data.s3.amazonaws.com/kinetics.zip", "./data")

datamodule = VideoClassificationData.from_folders(
    train_folder=os.path.join(os.getcwd(), "data/kinetics/train"),
    val_folder=os.path.join(os.getcwd(), "data/kinetics/val"),
    clip_sampler="uniform",
    clip_duration=1,
    decode_audio=False,
)

# 2. Build the task
model = VideoClassifier(backbone="x3d_xs", num_classes=datamodule.num_classes, pretrained=False)

# 3. Create the trainer and finetune the model
trainer = flash.Trainer(max_epochs=3)
trainer.finetune(model, datamodule=datamodule, strategy="freeze")

# 4. Make a prediction
predictions = model.predict(os.path.join(os.getcwd(), "data/kinetics/predict"))
print(predictions)

# 5. Save the model!
trainer.save_checkpoint("video_classification.pt")

Credits

Credit to PyTorch Team for providing the Bare Mnist example.

Credit to Andrej Karpathy for providing an implementation of minGPT.

Troubleshooting

Kill ddp processes

sudo kill -9 $(ps -aef | grep -i 'ddp' | grep -v 'grep' | awk '{ print $2 }')