NanoCode012 comments

Results 163 comments of


                                            NanoCode012

Model is not saved for full finetune with Deepspeed Zero3

Was there any stack trace error? Did you run out of space? Did the run abruptly quit?

Fix(preprocess): Use space delimiter for debug_text_only also

This PR is currently wrong. I didn't get a chance to come back to fix it yet. It adds extra spaces sometimes. Need to check again later.

Fix(preprocess): Use space delimiter for debug_text_only also

I noticed that , when printing the tokens, there were "empty" tokens, which led to the extra spacing. Maybe all I need is a check if string has len>0, to...

CUDA device error with llama2_chat strategy

Current code by default sets llamatokenizer's to use llama's EOS as pad token, except for the llama2 chat class above. > And if you still need padding, then better update...

Error Finetuning CodeQwen 1.5 7B: ```Column 1 named token_type_ids expected length 113 but got length 112```

Hey @artemdinaburg , sorry for bringing up an old topic. The information here helped me fix another Issue we had with `token_type_ids`. From the glossary https://huggingface.co/docs/transformers/glossary#token-type-ids, it seems to be...

Axolotl install and train is broken.

@MeDott29 , you may be having FA issues `RuntimeError: FlashAttention only supports Ampere GPUs or newer.` @amitagh , did you try other example configs?

Axolotl install and train is broken.

@chdaesung @jaydeepthik , sorry, I missed the notifications. Do you have a sample Colab notebook? I was able to run successfully on Colab these past weeks.

Switch to parallel FFD bin packing algorithm (closes #1492)

Hey, this is very interesting. Should there be some full run comparisons to make sure that there is no loss in performance?

FSDP with Mistral

fsdp for mistral uses `fsdp_transformer_layer_cls_to_wrap: MistralDecoderLayer`. If you would like to contribute an example, please feel free to make a PR.

FSDP with Mistral

Is it due to mismatch dtype?