Andrew Gu
Andrew Gu
These models are not supported yet. Only Llama is supported for now.
I am going to mark this as closed since it seems all questions have been answered. If you have follow-ups, feel free to re-open or open a new issue.
High-level comment: It might be worthwhile to document which parts are depending on FSDP2 internals, and we may be able to see how to expose things more robustly.
I will wait for all devs to approve on this one. We can wait until next year to land this when there is slower land velocity.
Failure are related to Dynamo and should be unrelated. ``` 2023-01-17T17:30:37.5734804Z Traceback (most recent call last): 2023-01-17T17:30:37.5735042Z File "/opt/conda/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 440, in _compile 2023-01-17T17:30:37.5735462Z ##[endgroup] 2023-01-17T17:30:37.5735569Z check_fn = CheckFunctionManager( 2023-01-17T17:30:37.5735804Z...
@pytorchbot merge
Should we be able to close https://github.com/pytorch/torchtitan/issues/61 after this PR? Also, do we need to run end-to-end numerics testing?
This looks good to me. I will let @wz337 review and approve.
I did not look into this closely, but could we rely on `.contiguous()` being a no-op if already contiguous and remove the stride check? (There might be ever-so-slightly more CPU...
@BadrYoubiIdrissi Curious which cases you are using fp16 training for (if you can share)?