ye-jin-shop

Results 2 issues of ye-jin-shop

I am trying to add weights to the loss function. I think it would be nice to have it in the class function? The original class CrossEntropyLoss has weight as...

enhancement

I noticed the following error message while running llama 3.1 70b model full finetune: ``` ... (task, pid=3369) [rank7]: File "/usr/local/lib/python3.10/dist-packages/torch/_inductor/kernel/flex_attention.py", line 2129, in flex_attention_backward (task, pid=3369) [rank7]: broadcasted_grad_key =...

bug