_githubsgi

Results 46 comments of _githubsgi

Yes, FSDPv2 should be enough for both. Would be glad to do a PR to just functionally enable pretraining of 1B and 3B if you are willing to review and...

Crated the PR - https://github.com/pytorch/torchtitan/pull/1040

@tianyu-l , I understand your concern about inflating the number of configs. In brief, these are some knobs I needed to figure out to debug issues related to deterministic compute...

@tianyu-l, sounds like you want to pull in all the debug config under a separate section in the toml file . Would it be something like the following in the...

@fegin , what is your thought on this ?

@tianyu-l , does the following debug(?) section look ok ? Which ever PR gets merged last - this or [that ](https://github.com/pytorch/torchtitan/pull/1670) can expand the debug section. ``` [debug] torch_deterministic =...

Looks like the failing cuda test below ( [Run Distributed Examples / test (pull_request) is done with a relatively old version of PyTorch ( torch==2.4.0.dev20240605+cu11 ). The upcoming release is...

@msaroufim , is it possible to update the PyTorch version in CI to 2.8 ?

@msaroufim , anything more I need to do ?

@msaroufim , the "Run Distributed Examples" check is failing due to the following . ``` Running example: distributed/ddp /home/runner/work/examples/examples/distributed/ddp/.venv/lib/python3.8/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally...