unslothai/unsloth: Fine-tuning & Reinforcement Learning for LLMs. 🦥...

Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma 2x faster with 80% less memory!

✨ Finetune for Free

All notebooks are beginner friendly! Add your dataset, click "Run All", and you'll get a 2x faster finetuned model which can be exported to GGUF, Ollama, vLLM or uploaded to Hugging Face.

Unsloth supports	Free Notebooks	Performance	Memory use
Llama 3.2 (3B)	▶️ Start for free	2x faster	70% less
GRPO (R1 reasoning)	▶️ Start for free	2x faster	80% less
Phi-4 (14B)	▶️ Start for free	2x faster	70% less
Llama 3.2 Vision (11B)	▶️ Start for free	2x faster	50% less
Llama 3.1 (8B)	▶️ Start for free	2x faster	70% less
Gemma 2 (9B)	▶️ Start for free	2x faster	70% less
Qwen 2.5 (7B)	▶️ Start for free	2x faster	70% less
Mistral v0.3 (7B)	▶️ Start for free	2.2x faster	75% less
Ollama	▶️ Start for free	1.9x faster	60% less
DPO Zephyr	▶️ Start for free	1.9x faster	50% less

See all our notebooks and all our models
Kaggle Notebooks for Llama 3.2 Kaggle notebook, Llama 3.1 (8B), Gemma 2 (9B), Mistral (7B)
Run notebooks for Llama 3.2 conversational, Llama 3.1 conversational and Mistral v0.3 ChatML
This continued pretraining notebook is for learning another language
Click here for detailed documentation for Unsloth.

⚡ Quickstart

Install with pip (recommended) for Linux devices:

pip install unsloth

For Windows install instructions, see here.

🦥 Unsloth.ai News

📣 NEW! Introducing Long-context Reasoning (GRPO) in Unsloth. You can now reproduce DeepSeek-R1's "aha" moment with just 5GB VRAM. Transform Llama, Phi, Mistral etc. into reasoning LLMs!
📣 NEW! DeepSeek-R1 - the most powerful open reasoning models with Llama & Qwen distillations. Run or fine-tune them now! More details: unsloth.ai/blog/deepseek-r1. All model uploads: here.
📣 NEW! Phi-4 by Microsoft is now supported. We also fixed bugs in Phi-4 and uploaded GGUFs, 4-bit. Try the Phi-4 Colab notebook
📣 NEW! Llama 3.3 (70B), Meta's latest model is supported.
📣 NEW! We worked with Apple to add Cut Cross Entropy. Unsloth now supports 89K context for Meta's Llama 3.3 (70B) on a 80GB GPU - 13x longer than HF+FA2. For Llama 3.1 (8B), Unsloth enables 342K context, surpassing its native 128K support.
📣 Introducing Unsloth Dynamic 4-bit Quantization! We dynamically opt not to quantize certain parameters and this greatly increases accuracy while only using <10% more VRAM than BnB 4-bit. See our collection on Hugging Face here.
📣 Vision models now supported! Llama 3.2 Vision (11B), Qwen 2.5 VL (7B) and Pixtral (12B) 2409

Click for more news

📣 We found and helped fix a gradient accumulation bug! Please update Unsloth and transformers.
📣 Try out Chat interface!
📣 NEW! Qwen-2.5 including Coder models are now supported with bugfixes. 14b fits in a Colab GPU! Qwen 2.5 conversational notebook
📣 NEW! Mistral Small 22b notebook finetuning fits in under 16GB of VRAM!
📣 NEW! pip install unsloth now works! Head over to pypi to check it out! This allows non git pull installs. Use pip install unsloth[colab-new] for non dependency installs.
📣 NEW! Continued Pretraining notebook for other languages like Korean!
📣 2x faster inference added for all our models
📣 We cut memory usage by a further 30% and now support 4x longer context windows!

🔗 Links and Resources

Type	Links
📚 Documentation & Wiki	Read Our Docs
Twitter (aka X)	Follow us on X
💾 Installation	Pip install
🔮 Our Models	Unsloth Releases
✍️ Blog	Read our Blogs
Reddit	Join our Reddit page

⭐ Key Features

All kernels written in OpenAI's Triton language. Manual backprop engine.
0% loss in accuracy - no approximation methods - all exact.
No change of hardware. Supports NVIDIA GPUs since 2018+. Minimum CUDA Capability 7.0 (V100, T4, Titan V, RTX 20, 30, 40x, A100, H100, L40 etc) Check your GPU! GTX 1070, 1080 works, but is slow.
Works on Linux and Windows
Supports 4bit and 16bit QLoRA / LoRA finetuning via bitsandbytes.
If you trained a model with 🦥Unsloth, you can use this cool sticker!

💾 Install Unsloth

You can also see our documentation for more detailed installation and updating instructions here.

Pip Installation

Install with pip (recommended) for Linux devices:

pip install unsloth

See here for advanced pip install instructions.

Windows Installation

[!warning] Python 3.13 does not support Unsloth. Use 3.12, 3.11 or 3.10

Install NVIDIA Video Driver: You should install the latest version of your GPUs driver. Download drivers here: NVIDIA GPU Drive.
Install Visual Studio C++: You will need Visual Studio, with C++ installed. By default, C++ is not installed with Visual Studio, so make sure you select all of the C++ options. Also select options for Windows 10/11 SDK. For more detailed instructions, see here.
Install CUDA Toolkit: Follow the instructions to install CUDA Toolkit.
Install PyTorch: You will need the correct version of PyTorch that is compatibile with your CUDA drivers, so make sure to select them carefully. Install PyTorch.
Install Unsloth:

pip install "unsloth[windows] @ git+https://github.com/unslothai/unsloth.git"

Notes

To run Unsloth directly on Windows:

Install Triton from this Windows fork and follow the instructions here (be aware that the Windows fork requires PyTorch >= 2.4 and CUDA 12)
In the SFTTrainer, set dataset_num_proc=1 to avoid a crashing issue:

trainer = SFTTrainer(
    dataset_num_proc=1,
    ...
)

Advanced/Troubleshooting

For advanced installation instructions or if you see weird errors during installations:

Install torch and triton. Go to https://pytorch.org to install it. For example pip install torch torchvision torchaudio triton
Confirm if CUDA is installated correctly. Try nvcc. If that fails, you need to install cudatoolkit or CUDA drivers.
Install xformers manually. You can try installing vllm and seeing if vllm succeeds. Check if xformers succeeded with python -m xformers.info Go to https://github.com/facebookresearch/xformers. Another option is to install flash-attn for Ampere GPUs.
Double check that your versions of Python, CUDA, CUDNN, torch, triton, and xformers are compatible with one another. The PyTorch Compatibility Matrix may be useful.
Finally, install bitsandbytes and check it with python -m bitsandbytes

Conda Installation (Optional)

⚠️Only use Conda if you have it. If not, use Pip. Select either pytorch-cuda=11.8,12.1 for CUDA 11.8 or CUDA 12.1. We support python=3.10,3.11,3.12.

conda create --name unsloth_env \
    python=3.11 \
    pytorch-cuda=12.1 \
    pytorch cudatoolkit xformers -c pytorch -c nvidia -c xformers \
    -y
conda activate unsloth_env

pip install unsloth

If you're looking to install Conda in a Linux environment, read here, or run the below 🔽

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash
~/miniconda3/bin/conda init zsh

Advanced Pip Installation

⚠️Do **NOT** use this if you have Conda. Pip is a bit more complex since there are dependency issues. The pip command is different for torch 2.2,2.3,2.4,2.5 and CUDA versions.

For other torch versions, we support torch211, torch212, torch220, torch230, torch240 and for CUDA versions, we support cu118 and cu121 and cu124. For Ampere devices (A100, H100, RTX3090) and above, use cu118-ampere or cu121-ampere or cu124-ampere.

For example, if you have torch 2.4 and CUDA 12.1, use:

pip install --upgrade pip
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"

Another example, if you have torch 2.5 and CUDA 12.4, use:

pip install --upgrade pip
pip install "unsloth[cu124-torch250] @ git+https://github.com/unslothai/unsloth.git"

And other examples:

pip install "unsloth[cu121-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-ampere-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu118-torch240] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch230] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu121-ampere-torch230] @ git+https://github.com/unslothai/unsloth.git"

pip install "unsloth[cu121-torch250] @ git+https://github.com/unslothai/unsloth.git"
pip install "unsloth[cu124-ampere-torch250] @ git+https://github.com/unslothai/unsloth.git"

Or, run the below in a terminal to get the optimal pip installation command:

wget -qO- https://raw.githubusercontent.com/unslothai/unsloth/main/unsloth/_auto_install.py | python -

Or, run the below manually in a Python REPL:

try: import torch
except: raise ImportError('Install torch via `pip install torch`')
from packaging.version import Version as V
v = V(torch.__version__)
cuda = str(torch.version.cuda)
is_ampere = torch.cuda.get_device_capability()[0] >= 8
if cuda != "12.1" and cuda != "11.8" and cuda != "12.4": raise RuntimeError(f"CUDA = {cuda} not supported!")
if   v <= V('2.1.0'): raise RuntimeError(f"Torch = {v} too old!")
elif v <= V('2.1.1'): x = 'cu{}{}-torch211'
elif v <= V('2.1.2'): x = 'cu{}{}-torch212'
elif v  < V('2.3.0'): x = 'cu{}{}-torch220'
elif v  < V('2.4.0'): x = 'cu{}{}-torch230'
elif v  < V('2.5.0'): x = 'cu{}{}-torch240'
elif v  < V('2.6.0'): x = 'cu{}{}-torch250'
else: raise RuntimeError(f"Torch = {v} too new!")
x = x.format(cuda.replace(".", ""), "-ampere" if is_ampere else "")
print(f'pip install --upgrade pip && pip install "unsloth[{x}] @ git+https://github.com/unslothai/unsloth.git"')

📜 Documentation

Go to our official Documentation for saving to GGUF, checkpointing, evaluation and more!
We support Huggingface's TRL, Trainer, Seq2SeqTrainer or even Pytorch code!
We're in 🤗Hugging Face's official docs! Check out the SFT docs and DPO docs!
If you want to download models from the ModelScope community, please use an environment variable: UNSLOTH_USE_MODELSCOPE=1, and install the modelscope library by: pip install modelscope -U.

unsloth_cli.py also supports UNSLOTH_USE_MODELSCOPE=1 to download models and datasets. please remember to use the model and dataset id in the ModelScope community.

from unsloth import FastLanguageModel 
import torch
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
max_seq_length = 2048 # Supports RoPE Scaling interally, so choose any!
# Get LAION dataset
url = "https://huggingface.co/datasets/laion/OIG/resolve/main/unified_chip2.jsonl"
dataset = load_dataset("json", data_files = {"train" : url}, split = "train")

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 2x faster
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # 4bit for 405b!
    "unsloth/Mistral-Small-Instruct-2409",     # Mistral 22b 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!

    "unsloth/Llama-3.2-1B-bnb-4bit",           # NEW! Llama 3.2 models
    "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "unsloth/Llama-3.2-3B-bnb-4bit",
    "unsloth/Llama-3.2-3B-Instruct-bnb-4bit",

    "unsloth/Llama-3.3-70B-Instruct-bnb-4bit" # NEW! Llama 3.3 70B!
] # More models at https://huggingface.co/unsloth

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B",
    max_seq_length = max_seq_length,
    load_in_4bit = True,
)

# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    max_seq_length = max_seq_length,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

trainer = SFTTrainer(
    model = model,
    train_dataset = dataset,
    tokenizer = tokenizer,
    args = SFTConfig(
        dataset_text_field = "text",
        max_seq_length = max_seq_length,
        per_device_train_batch_size = 2,
        gradient_accumulation_steps = 4,
        warmup_steps = 10,
        max_steps = 60,
        logging_steps = 1,
        output_dir = "outputs",
        optim = "adamw_8bit",
        seed = 3407,
    ),
)
trainer.train()

# Go to https://github.com/unslothai/unsloth/wiki for advanced tips like
# (1) Saving to GGUF / merging to 16bit for vLLM
# (2) Continued training from a saved LoRA adapter
# (3) Adding an evaluation loop / OOMs
# (4) Customized chat templates

💡 Reinforcement Learning

RL including DPO, GRPO, PPO, Reward Modelling, Online DPO all work with Unsloth. We're in 🤗Hugging Face's official docs! We're on the SFT docs and the DPO docs! List of RL notebooks:

ORPO notebook: Link
DPO Zephyr notebook: Link
KTO notebook: Link
SimPO notebook: Link

Click for DPO code

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Optional set GPU device ID

from unsloth import FastLanguageModel
import torch
from trl import DPOTrainer, DPOConfig
max_seq_length = 2048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/zephyr-sft-bnb-4bit",
    max_seq_length = max_seq_length,
    load_in_4bit = True,
)

# Do model patching and add fast LoRA weights
model = FastLanguageModel.get_peft_model(
    model,
    r = 64,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 64,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    max_seq_length = max_seq_length,
)

dpo_trainer = DPOTrainer(
    model = model,
    ref_model = None,
    train_dataset = YOUR_DATASET_HERE,
    # eval_dataset = YOUR_DATASET_HERE,
    tokenizer = tokenizer,
    args = DPOConfig(
        per_device_train_batch_size = 4,
        gradient_accumulation_steps = 8,
        warmup_ratio = 0.1,
        num_train_epochs = 3,
        logging_steps = 1,
        optim = "adamw_8bit",
        seed = 42,
        output_dir = "outputs",
        max_length = 1024,
        max_prompt_length = 512,
        beta = 0.1,
    ),
)
dpo_trainer.train()

🥇 Performance Benchmarking

For our most detailed benchmarks, read our Llama 3.3 Blog.
Benchmarking of Unsloth was also conducted by 🤗Hugging Face.

We tested using the Alpaca Dataset, a batch size of 2, gradient accumulation steps of 4, rank = 32, and applied QLoRA on all linear layers (q, k, v, o, gate, up, down):

Model	VRAM	🦥 Unsloth speed	🦥 VRAM reduction	🦥 Longer context	😊 Hugging Face + FA2
Llama 3.3 (70B)	80GB	2x	>75%	13x longer	1x
Llama 3.1 (8B)	80GB	2x	>70%	12x longer	1x

Context length benchmarks

Llama 3.1 (8B) max. context length

We tested Llama 3.1 (8B) Instruct and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
8 GB	2,972	OOM
12 GB	21,848	932
16 GB	40,724	2,551
24 GB	78,475	5,789
40 GB	153,977	12,264
48 GB	191,728	15,502
80 GB	342,733	28,454

Llama 3.3 (70B) max. context length

We tested Llama 3.3 (70B) Instruct on a 80GB A100 and did 4bit QLoRA on all linear layers (Q, K, V, O, gate, up and down) with rank = 32 with a batch size of 1. We padded all sequences to a certain maximum sequence length to mimic long context finetuning workloads.

GPU VRAM	🦥Unsloth context length	Hugging Face + FA2
48 GB	12,106	OOM
80 GB	89,389	6,916

Citation

You can cite the Unsloth repo as follows:

@software{unsloth,
  author = {Daniel Han, Michael Han and Unsloth team},
  title = {Unsloth},
  url = {http://github.com/unslothai/unsloth},
  year = {2023}
}

Thank You to

Hugging Face's TRL library which serves as the basis foundation for Unsloth
Erik for his help adding Apple's ML Cross Entropy in Unsloth
HuyNguyen-hust for making RoPE Embeddings 28% faster
RandomInternetPreson for confirming WSL support
152334H for experimental DPO support

unsloth
unsloth copied to clipboard

Metadata

Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma 2x faster with 80% less memory!

✨ Finetune for Free

⚡ Quickstart

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

💾 Install Unsloth

Pip Installation

Windows Installation

Notes

Advanced/Troubleshooting

Conda Installation (Optional)

Advanced Pip Installation

📜 Documentation

💡 Reinforcement Learning

🥇 Performance Benchmarking

Context length benchmarks

Llama 3.1 (8B) max. context length

Llama 3.3 (70B) max. context length

Citation

Thank You to

← Metadata

Owner

Metadata

unsloth unsloth copied to clipboard

Metadata

Finetune Llama 3.3, Mistral, Phi-4, Qwen 2.5 & Gemma 2x faster with 80% less memory!

✨ Finetune for Free

⚡ Quickstart

🦥 Unsloth.ai News

🔗 Links and Resources

⭐ Key Features

💾 Install Unsloth

Pip Installation

Windows Installation

Notes

Advanced/Troubleshooting

Conda Installation (Optional)

Advanced Pip Installation

📜 Documentation

💡 Reinforcement Learning

🥇 Performance Benchmarking

Context length benchmarks

Llama 3.1 (8B) max. context length

Llama 3.3 (70B) max. context length

Citation

Thank You to

← Metadata

Owner

Metadata

unsloth
unsloth copied to clipboard