Carlos Mocholí

Results 427 comments of Carlos Mocholí

It's a well-known trick for training with variable length sequences. Sometimes it can impact the loss as it can impact the i.i.d property of ml training depending on your data....

Linked issue request for fine-tuning: https://github.com/Lightning-AI/lit-llama/issues/180

It's complaining about a missing comma in the json file you are loading. Where did you get this file from? Have you tried downloading it again?

Hi @mzchtx. What changes are you proposing precisely? `k, v` should already be sliced to the length of `input_pos` with https://github.com/Lightning-AI/lit-llama/blob/main/lit_llama/model.py#L217-L218

@mzchtx Would you like to open a PR with your suggested changes?

@mzchtx The code would need to be indented under the `if` as before. since this is only relevant for the kv-cache case. Leaving #382 aside, I believe the code should...

From playing with this, the generated outputs are not the same, meaning that this is not numerically equivalent. However, it's hard to tell if they are worse or just different....

I stumbled upon this issue: https://github.com/pytorch/pytorch/issues/103082, it might explain the numerical difference.

@gkroiz Could this change be detrimental to XLA's performance?

@mzchtx Did you measure the performance difference? Would you like to open a PR with your suggestion?