_githubsgi
_githubsgi
Yes, FSDPv2 should be enough for both. Would be glad to do a PR to just functionally enable pretraining of 1B and 3B if you are willing to review and...
Crated the PR - https://github.com/pytorch/torchtitan/pull/1040
@tianyu-l , I understand your concern about inflating the number of configs. In brief, these are some knobs I needed to figure out to debug issues related to deterministic compute...
@tianyu-l, sounds like you want to pull in all the debug config under a separate section in the toml file . Would it be something like the following in the...
@fegin , what is your thought on this ?
@tianyu-l , does the following debug(?) section look ok ? Which ever PR gets merged last - this or [that ](https://github.com/pytorch/torchtitan/pull/1670) can expand the debug section. ``` [debug] torch_deterministic =...
Looks like the failing cuda test below ( [Run Distributed Examples / test (pull_request) is done with a relatively old version of PyTorch ( torch==2.4.0.dev20240605+cu11 ). The upcoming release is...
@msaroufim , is it possible to update the PyTorch version in CI to 2.8 ?
@msaroufim , anything more I need to do ?
@msaroufim , the "Run Distributed Examples" check is failing due to the following . ``` Running example: distributed/ddp /home/runner/work/examples/examples/distributed/ddp/.venv/lib/python3.8/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally...