tginart
tginart
I ran into this as well. You can fix it by just getting rid of the typing hints. You'll probably run into more bugs after that though. Let me know...
Hi @abhi-mosaic. I am referring to the metrics from the composer Evaluator in the train.py (https://github.com/mosaicml/llm-foundry/blob/3c66b1c5df668e0684548fef30d00669df64636c/scripts/train/train.py#LL158C1-L162C79) So not sure if we are talking about the same thing? I'm still running...
Hi @abhi-mosaic, thank you. That is what I am looking for but I was wondering if it was possible to compute loss/train for every example in the eval set &...
Not sure if this is helpful but I was *only* able to get Triton's flash attention to work on an A100. I tried H100, A10, A6000... & nope.
I second SLURM! I have also been trying to hack this into torchtune since the single-node experience is quite good.
@EugenHotaj Thanks for the tip. Did you use something like https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py as the entry point to replace "./train.py" in [line 63](https://github.com/pytorch/torchtitan/blob/e8977b0071c868f44552a5c8bfbcb66b8fda1efe/multinode_trainer.slurm#L62) ?
Definitely… if this becomes support I’d love to beta test an official multi node recipe. On Mon, Jan 13, 2025 at 10:08 AM Chris Siefert ***@***.***> wrote: > @EugenHotaj @joecummings...
I forgot to update here but I can confirm @EugenHotaj 's approach of using the torchtitan slurm file (with a few tweaks that are probably specific to your own slurm...
Hi @EugenHotaj, is there any example anywhere of how to use train_on_input=False in config?
Ah! Thank you! If we just want to use regular masking for user input, how do we turn on ``train_on_input=False`` ? Or is it on by default? @RdoubleA ?