_githubsgi comments

Results 46 comments of


                                            _githubsgi

Any plan to add Llama 1B and/or 3B models ?

Yes, FSDPv2 should be enough for both. Would be glad to do a PR to just functionally enable pretraining of 1B and 3B if you are willing to review and...

Any plan to add Llama 1B and/or 3B models ?

Crated the PR - https://github.com/pytorch/torchtitan/pull/1040

Adding config options for deterministic execution

@tianyu-l , I understand your concern about inflating the number of configs. In brief, these are some knobs I needed to figure out to debug issues related to deterministic compute...

Adding config options for deterministic execution

@tianyu-l, sounds like you want to pull in all the debug config under a separate section in the toml file . Would it be something like the following in the...

Adding config options for deterministic execution

@fegin , what is your thought on this ?

Adding config options for deterministic execution

@tianyu-l , does the following debug(?) section look ok ? Which ever PR gets merged last - this or [that ](https://github.com/pytorch/torchtitan/pull/1670) can expand the debug section. ``` [debug] torch_deterministic =...

TP SP examples improvement

Looks like the failing cuda test below ( [Run Distributed Examples / test (pull_request) is done with a relatively old version of PyTorch ( torch==2.4.0.dev20240605+cu11 ). The upcoming release is...

TP SP examples improvement

@msaroufim , is it possible to update the PyTorch version in CI to 2.8 ?

TP SP examples improvement

@msaroufim , anything more I need to do ?

TP SP examples improvement

@msaroufim , the "Run Distributed Examples" check is failing due to the following . ``` Running example: distributed/ddp /home/runner/work/examples/examples/distributed/ddp/.venv/lib/python3.8/site-packages/torch/_subclasses/functional_tensor.py:258: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally...