tginart comments

Results 22 comments of


                                            tginart

The error of test the Dataloader

I ran into this as well. You can fix it by just getting rid of the typing hints. You'll probably run into more bugs after that though. Let me know...

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Hi @abhi-mosaic. I am referring to the metrics from the composer Evaluator in the train.py (https://github.com/mosaicml/llm-foundry/blob/3c66b1c5df668e0684548fef30d00669df64636c/scripts/train/train.py#LL158C1-L162C79) So not sure if we are talking about the same thing? I'm still running...

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Hi @abhi-mosaic, thank you. That is what I am looking for but I was wondering if it was possible to compute loss/train for every example in the eval set &...

Triton Test Failed: GPU SMs must run at 1350 MHz / GPU memory must run at 877 MHz

Not sure if this is helpful but I was *only* able to get Triton's flash attention to work on an A100. I tried H100, A10, A6000... & nope.

GPU Middle Class?

I second SLURM! I have also been trying to hack this into torchtune since the single-node experience is quite good.

GPU Middle Class?

@EugenHotaj Thanks for the tip. Did you use something like https://github.com/pytorch/torchtune/blob/main/recipes/full_finetune_distributed.py as the entry point to replace "./train.py" in [line 63](https://github.com/pytorch/torchtitan/blob/e8977b0071c868f44552a5c8bfbcb66b8fda1efe/multinode_trainer.slurm#L62) ?

tginart

The error of test the Dataloader

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Configure eval to give 'loss/eval' that is analgous to 'loss/train'

Triton Test Failed: GPU SMs must run at 1350 MHz / GPU memory must run at 877 MHz

GPU Middle Class?

GPU Middle Class?

GPU Middle Class?

GPU Middle Class?

More Chat Loss Masking Strategies

More Chat Loss Masking Strategies