distributed-training-guide
distributed-training-guide copied to clipboard
FSDP: GPT2LMHeadModel object has no attribute model
First of all, thank you so much for putting together these tutorials! I am slowly working through them and trying to better understand how it all fits together.
I have the DDP example working well on my 4 x L40S single node server, but I can't seem to get the FSDP example to work on a single node (maybe that is my problem?).
# torchrun --nproc-per-node gpu train_llm.py -d tatsu-lab/alpaca -m openai-community/gpt2 --cpu-offload
[rank1]: Traceback (most recent call last):
[rank1]: File "/workspace/distributed-training-guide/04-fully-sharded-data-parallel/train_llm.py", line 389, in <module>
[rank1]: main()
[rank1]: File "/usr/local/lib/python3.12/dist-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
[rank1]: return f(*args, **kwargs)
[rank1]: ^^^^^^^^^^^^^^^^^^
[rank1]: File "/workspace/distributed-training-guide/04-fully-sharded-data-parallel/train_llm.py", line 88, in main
[rank1]: for decoder in model.model.layers:
[rank1]: ^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1940, in __getattr__
[rank1]: raise AttributeError(
[rank1]: AttributeError: 'GPT2LMHeadModel' object has no attribute 'model'
I haven't got it as far as testing multi node yet, but that was my next step.
I think I have all the correct requirements (transformers=4.57.0) but the pytorch version is the somewhat customised 2.8.0 version that comes in the nvidia pytorch container (2.8.0a0+5228986c39.nv25.06).
The model and dataset are downloaded, cached and verified working with the DDP example (even "offline").
Off topic: the inclusion of the llama-405b tutorial is great, but I doubt many people will be able to run that! It would be awesome if you could include a more modest larger training example too (>gpt2). For people with a few nodes of dual or quad 48G GPUs for example.
Anyway, thanks again for putting this together, much appreciated.