NanoCode012

Results 163 comments of NanoCode012

Was there any stack trace error? Did you run out of space? Did the run abruptly quit?

This PR is currently wrong. I didn't get a chance to come back to fix it yet. It adds extra spaces sometimes. Need to check again later.

I noticed that , when printing the tokens, there were "empty" tokens, which led to the extra spacing. Maybe all I need is a check if string has len>0, to...

Current code by default sets llamatokenizer's to use llama's EOS as pad token, except for the llama2 chat class above. > And if you still need padding, then better update...

Hey @artemdinaburg , sorry for bringing up an old topic. The information here helped me fix another Issue we had with `token_type_ids`. From the glossary https://huggingface.co/docs/transformers/glossary#token-type-ids, it seems to be...

@MeDott29 , you may be having FA issues `RuntimeError: FlashAttention only supports Ampere GPUs or newer.` @amitagh , did you try other example configs?

@chdaesung @jaydeepthik , sorry, I missed the notifications. Do you have a sample Colab notebook? I was able to run successfully on Colab these past weeks.

Hey, this is very interesting. Should there be some full run comparisons to make sure that there is no loss in performance?

fsdp for mistral uses `fsdp_transformer_layer_cls_to_wrap: MistralDecoderLayer`. If you would like to contribute an example, please feel free to make a PR.

Is it due to mismatch dtype?