lightning-thunder AttributeError: \'Float8Tensor\' object has no attribute \'_fp8

🐛 Bug

When using DDP with Dynamo+Thunder we get:

AttributeError: 'Float8Tensor' object has no attribute '_fp8_attrs'

This issue affects the following models:

'dolly-v2-3b', 'Mistral-7B-v0.1', 'tiny-llama-1.1b', 'stablecode-completion-alpha-3b', 'Phi-3-mini-4k-instruct', 'falcon-7b'

To Reproduce

Please use: 1 node with 8 GPUs.

Then execute:

torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py \
    --model_name Mistral-7B-v0.1 \
    --distributed_mode ddp \
    --shard_mode None \
    --compile dynamo_thunder \
    --checkpoint_activations False \
    --low_precision_mode fp8-delayed-te  \
    --micro_batch_size 1

Expected behavior

We should not get an error.

Environment

system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.1.006 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.7 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.10+git91997b3 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.5.0a0+git9902b34 libraries.pip.torchmetrics 1.4.1 libraries.pip.torchvision 0.19.0a0+d23a6e1

Sep 16 '24 09:09 mpatel31415

rel: https://github.com/Lightning-AI/lightning-thunder/issues/1137

Sep 16 '24 09:09 crcrpar

@mpatel31415, this should be fixed with #1204, https://github.com/Lightning-AI/lightning-thunder/pull/1170.

I have tried running the above command and it now fails with OOM (instead of AttributeError)

Running a smaller 25 layer model works fine now Command for 25 layer model -

torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py     --model_name Mistral-7B-v0.1     --distributed_mode ddp     --shard_mode None     --compile dynamo_thunder     --checkpoint_activations False     --low_precision_mode fp8-delayed-te      --micro_batch_size 1 --n_layer=25

Also, I have verified for tiny-llama-1.1b and it works fine

torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py     --model_name tiny-llama-1.1b     --distributed_mode ddp     --shard_mode None     --compile dynamo_thunder     --checkpoint_activations False     --low_precision_mode fp8-delayed-te      --micro_batch_size 1

Can you please check if it works fine for other models? Thank you very much!

Sep 26 '24 18:09 kshitij12345

AttributeError: \'Float8Tensor\' object has no attribute \'_fp8_attrs\'

🐛 Bug

To Reproduce

Expected behavior

Environment