AttributeError: \'Float8Tensor\' object has no attribute \'_fp8_attrs\'
🐛 Bug
When using DDP with Dynamo+Thunder we get:
AttributeError: 'Float8Tensor' object has no attribute '_fp8_attrs'
This issue affects the following models:
'dolly-v2-3b', 'Mistral-7B-v0.1', 'tiny-llama-1.1b', 'stablecode-completion-alpha-3b', 'Phi-3-mini-4k-instruct', 'falcon-7b'
To Reproduce
Please use: 1 node with 8 GPUs.
Then execute:
torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py \
--model_name Mistral-7B-v0.1 \
--distributed_mode ddp \
--shard_mode None \
--compile dynamo_thunder \
--checkpoint_activations False \
--low_precision_mode fp8-delayed-te \
--micro_batch_size 1
Expected behavior
We should not get an error.
Environment
system.device_product_name DGXH100 system.gpu_driver_version 535.129.03 libraries.cuda 12.6.1.006 libraries.pip.lightning 2.4.0.dev20240728 libraries.pip.lightning-thunder 0.2.0.dev0 libraries.pip.lightning-utilities 0.11.7 libraries.pip.litgpt 0.4.11 libraries.pip.nvfuser 0.2.10+git91997b3 libraries.pip.pytorch-lightning 2.4.0 libraries.pip.torch 2.5.0a0+git9902b34 libraries.pip.torchmetrics 1.4.1 libraries.pip.torchvision 0.19.0a0+d23a6e1
rel: https://github.com/Lightning-AI/lightning-thunder/issues/1137
@mpatel31415, this should be fixed with #1204, https://github.com/Lightning-AI/lightning-thunder/pull/1170.
I have tried running the above command and it now fails with OOM (instead of AttributeError)
Running a smaller 25 layer model works fine now Command for 25 layer model -
torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py --model_name Mistral-7B-v0.1 --distributed_mode ddp --shard_mode None --compile dynamo_thunder --checkpoint_activations False --low_precision_mode fp8-delayed-te --micro_batch_size 1 --n_layer=25
Also, I have verified for tiny-llama-1.1b and it works fine
torchrun --standalone --max-restarts=0 --no-python --nproc-per-node=8 python /opt/pytorch/lightning-thunder/thunder/benchmarks/benchmark_litgpt.py --model_name tiny-llama-1.1b --distributed_mode ddp --shard_mode None --compile dynamo_thunder --checkpoint_activations False --low_precision_mode fp8-delayed-te --micro_batch_size 1
Can you please check if it works fine for other models? Thank you very much!