Chris Siefert

Results 165 comments of Chris Siefert

@ndellingwood @iyamazaki

See also this https://github.com/trilinos/Trilinos/issues/13445

This issue is also relevant to sync-behavior documentation.

@achauphan I'm getting: ``` +==============================================================================+ | ERROR: The following section(s) in your config-specs.ini file | do not match any systems listed in | 'supported-systems.ini': ``` It really thinks my machine...

@achauphan Machine: One of the ascicgpus Config: sems-gnu-8.3.0-openmpi-1.10.1 The bypass only generates an error in the shell script, so clearly I'm misunderstanding the instructions somehow.

> Just launch the scripts with `accelerate launch` or `torchrun`, no need to do anything else My attempts to do that have not been successful... run_clm seems happy to fill...

For instance, with the v4.41-release branch of transformers off of github, if I grab 4 A100s with 80GB of RAM each and do this: ``` torchrun --nproc-per-node 4 ./run_clm.py --model_name_or_path=mistralai/Mistral-7B-Instruct-v0.2...

@amyeroberts That example doesn't use the `trainer.train()` function, which is what I'd (ideally) like to use.