NanoCode012 comments

Results 180 comments of


                                            NanoCode012

loss spike when training qwen1.5 with sample_packing:true

You may try the datasets in the example configs for testing though they're a bit small.

Support Fuyu-8B

With the addition of the Fuyu to transformers, axolotl should inherently support it. `sample_packing` and `flash_attention` would not work as their modeling code does not support it.

Docker image results is not finding libcudart.so

Hm, that's weird. I have nvidia-smi showing `CUDA 12.0` on host and I can run `python -m bitsandbytes` successfully in docker. If you have the axolotl repo clone, do you...

Docker image results is not finding libcudart.so

Sorry for late reply. > I can run python -m bitsandbytes successfully, though it says that it is targeting CUDA 11.8 (BNB_CUDA_VERSION=118) Axolotl is targeting 11.8 for default image. You...

Evaluation took much more time when enable eval_table_size

Hi, sorry for late reply. `eval_table_size` is not working as well as expected unfortunately. That's why it's disabled by default.

Evaluation took much more time when enable eval_table_size

@ysawej . Sorry for that. It seems that this feature is not properly maintained anymore. If you would like, would you be able to take a look and perhaps submit...

Fulltune training Mistral 7B using JSON dataset instead of JSONL makes the model incoherent

Could you check your prompt format is the same? Your loss also starts quite high.

Fulltune training Mistral 7B using JSON dataset instead of JSONL makes the model incoherent

Did not know that using JSON would cause such an issue. That sounds weird. I will close this issue. Please re-open if the problem re-occurs.

Fulltune training Mistral 7B using JSON dataset instead of JSONL makes the model incoherent

Sorry for that. The issue is reopened. Could you please provide an example config of where json does not work? The dataset handler for json and jsonl is the same...

Fulltune training Mistral 7B using JSON dataset instead of JSONL makes the model incoherent

@l3utterfly , do you perhaps still have the offending dataset to share or some sample of it for reproducing this?