unsloth Llama-3 now supported

Colab for Llama-3 8b: https://colab.research.google.com/drive/135ced7oHytdxu3N2DNe1Z0kqjyYIkDXp?usp=sharing

Apr 18 '24 20:04 danielhanchen

Edit 2: I'm not sure why it worked that time, but it's back. Probably something to do with my environment?

~~Edit: running pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes over my conda environment fixed it.~~

Thanks @danielhanchen!

I changed the model from mistral to llama-3 in my training script based on the ChatML notebook from the readme (gist of code and full log), and it went from working to crashing on trainer.train():

  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
    fx[(n_rows,)](
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 550, in run
    bin.c_wrapper(
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 692, in __getattribute__
    self._init_handles()
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 683, in _init_handles
    mod, func, n_regs, n_spills = fn_load_binary(self.metadata["name"], self.asm[bin_path], self.shared, device)
RuntimeError: Triton Error [CUDA]: device-side assert triggered
Aborted (core dumped)

Any ideas?

Apr 18 '24 21:04 tnunamak

@tnunamak Yes can reproduce - sorry working on a fix!

Apr 21 '24 02:04 danielhanchen

Edit 2: I'm not sure why it worked that time, but it's back. Probably something to do with my environment?

~Edit: running pip install --no-deps packaging ninja einops flash-attn xformers trl peft accelerate bitsandbytes over my conda environment fixed it.~

Thanks @danielhanchen!

I changed the model from mistral to llama-3 in my training script based on the ChatML notebook from the readme (gist of code and full log), and it went from working to crashing on trainer.train():

  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/models/llama.py", line 432, in LlamaDecoderLayer_fast_forward
    hidden_states = fast_rms_layernorm(self.input_layernorm, hidden_states)
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 190, in fast_rms_layernorm
    out = Fast_RMS_Layernorm.apply(X, W, eps, gemma)
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/unsloth/kernels/rms_layernorm.py", line 144, in forward
    fx[(n_rows,)](
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/runtime/jit.py", line 550, in run
    bin.c_wrapper(
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 692, in __getattribute__
    self._init_handles()
  File "/home/tnunamak/applications/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/triton/compiler/compiler.py", line 683, in _init_handles
    mod, func, n_regs, n_spills = fn_load_binary(self.metadata["name"], self.asm[bin_path], self.shared, device)
RuntimeError: Triton Error [CUDA]: device-side assert triggered
Aborted (core dumped)

Any ideas?

I had this same issue when trying to work around the eos token (<|eot_id|>) issue manually.

Apr 21 '24 15:04 sion42x

@sion42x Yep working on a fix - i think ill push it in today - much apologies on the issue!

Apr 21 '24 17:04 danielhanchen

@tnunamak @sion42x Fixed!! On a local machine please reinstall ie via:

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

For Colab / Kaggle, restart and run again :) Sorry on the issue!

Apr 21 '24 19:04 danielhanchen

The issue still exists in the new update I think. Did any one solve the problem? Thnx

Jun 05 '24 14:06 ekmekovski

@emreekmekcioglu1 Did you try reinstalling

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Jun 06 '24 16:06 danielhanchen

The issue still exists in the new update I think. Did any one solve the problem? Thnx

It's possible the issue is in the engine you're using to run it. Not everything runs Llama 3 well. You can see in the GGUF I finetuned with unsloth: https://huggingface.co/yaystevek/llama-3-8b-Instruct-OpenHermes-2.5-QLoRA-GGUF

It works great through llama.cpp directly and other tools using it, but ollama for example still had infinite generation issues.

Jun 06 '24 16:06 sion42x

Hmm has Ollama updated their implementation?

Jun 09 '24 14:06 danielhanchen

unsloth unsloth copied to clipboard

Llama-3 now supported

unsloth
unsloth copied to clipboard