mlx-examples Adding full finetuning

As before, this pull request contains changes to the code to feature full finetuning, files changed are lora.py, trainer,py and the LORA.md file for the new arguments.

the new training arguments are now:

python -m mlx_lm.lora \
    --model  \
    --train \
    --fine-tune-type full \
    --data  \
    --iters 100 \
    --batch-size 1 \
    --val-batches 1 \
    --adapter-path

to change the method of finetuning, simply change --fine-tune-type to dora, lora, or full, default is LoRA. Path where the adapters of the full model weights are stored is still --adapter-file.

Jul 23 '24 09:07 Goekdeniz-Guelmez

Tested with gemma, mistral, llama3 (tiny versions because i only have a 8GB Macbook Air). Here is a Example with Gemma:

FULL

python -m mlx_lm.lora \
    --model /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Transformer\ Models/Safetensors/tiny-random-GemmaForCausalLM \
    --train \
    --fine-tune-type full \
    --data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/data_tyni \
    --iters 10 \
    --batch-size 1 \
    --val-batches 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain
Loading pretrained model
Loading datasets
Training
Training full model weights.
Trainable parameters: 100.000% (2.049M/2.049M)
Starting training..., iters: 10
Iter 1: Val loss 12.459, Val took 0.455s
Iter 2: Saved model checkpoint weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/0000002_checkpoint.safetensors.
Iter 4: Saved model checkpoint weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/0000004_checkpoint.safetensors.
Iter 6: Saved model checkpoint weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/0000006_checkpoint.safetensors.
Iter 8: Saved model checkpoint weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/0000008_checkpoint.safetensors.
Iter 10: Val loss 12.456, Val took 0.346s
Iter 10: Train loss 12.456, Learning Rate 1.000e-05, It/sec 17.743, Tokens/sec 17372.541, Trained Tokens 9791, Peak mem 6.281 GB
Iter 10: Saved model checkpoint weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/0000010_checkpoint.safetensors.
Saved final full model weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain/model.safetensors.

LORA

python -m mlx_lm.lora \
    --model /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Transformer\ Models/Safetensors/tiny-random-GemmaForCausalLM \
    --train \
    --fine-tune-type lora \
    --data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/data_tyni \
    --iters 10 \
    --batch-size 1 \
    --val-batches 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora
Loading pretrained model
Loading datasets
Training
Training model with LoRA.
Trainable parameters: 0.022% (0.000M/2.049M)
Starting training..., iters: 10
Iter 1: Val loss 12.459, Val took 0.372s
Iter 2: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/0000002_adapters.safetensors.
Iter 4: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/0000004_adapters.safetensors.
Iter 6: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/0000006_adapters.safetensors.
Iter 8: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/0000008_adapters.safetensors.
Iter 10: Val loss 12.457, Val took 0.219s
Iter 10: Train loss 12.457, Learning Rate 1.000e-05, It/sec 25.792, Tokens/sec 25253.294, Trained Tokens 9791, Peak mem 6.128 GB
Iter 10: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/0000010_adapters.safetensors.
Saved final adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-lora/adapter.safetensors.

DORA

python -m mlx_lm.lora \
    --model /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Transformer\ Models/Safetensors/tiny-random-GemmaForCausalLM \
    --train \
    --fine-tune-type dora \
    --data /Users/gokdenizgulmez/Library/Mobile\ Documents/com\~apple\~CloudDocs/Datastes/data_tyni \
    --iters 10 \
    --batch-size 1 \
    --val-batches 1 \
    --adapter-path /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora
Loading pretrained model
Loading datasets
Training
Training model with DoRA.
Trainable parameters: 0.023% (0.000M/2.049M)
Starting training..., iters: 10
Iter 1: Val loss 12.460, Val took 0.241s
Iter 2: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/0000002_adapters.safetensors.
Iter 4: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/0000004_adapters.safetensors.
Iter 6: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/0000006_adapters.safetensors.
Iter 8: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/0000008_adapters.safetensors.
Iter 10: Val loss 12.457, Val took 0.164s
Iter 10: Train loss 12.457, Learning Rate 1.000e-05, It/sec 26.589, Tokens/sec 26033.065, Trained Tokens 9791, Peak mem 6.129 GB
Iter 10: Saved adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors and /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/0000010_adapters.safetensors.
Saved final adapter weights to /Users/gokdenizgulmez/Desktop/tinyGemma-pretrain-dora/adapter.safetensors.

Jul 23 '24 11:07 Goekdeniz-Guelmez

Thanks!! Will review shortly!

Jul 23 '24 21:07 awni

Thanks!! Will review shortly!

Perfect!

Jul 24 '24 09:07 Goekdeniz-Guelmez

Any idea how to correct this error?

  File ".../venv/lib/python3.8/site-packages/mlx/nn/utils.py", line 34, in wrapped_value_grad_fn
    value, grad = value_grad_fn(model.trainable_parameters(), *args, **kwargs)
RuntimeError: QuantizedMatmul::vjp no gradient wrt the quantized matrix yet.

Aug 13 '24 03:08 Jonathan-Dobson

You can't fine-tune the quantized layers. You can use a fp16, bf16, or fp32 model for full fine-tuning. The half precision types need care to avoid numerical issues, so ymmv. If you want to use a quantized model you can do QLoRA.

Aug 13 '24 03:08 awni

You can't fine-tune the quantized layers. You can use a fp16, bf16, or fp32 model for full fine-tuning. The half precision types need care to avoid numerical issues, so ymmv. If you want to use a quantized model you can do QLoRA.

OK. Thanks for the info. I probably won't attempt full tuning 16 bit models on my 2021 M1 I was trying to fine tune a LoRA on mlx-community/DeepSeek-V2-Lite-Chat-4bit-mlx but failed with this error:

File ".../llms/mlx_lm/tuner/utils.py", line 132, in linear_to_lora_layers
    raise ValueError(f"Lora does not support {model.model_type}")
ValueError: Lora does not support deepseek_v2

Aug 13 '24 03:08 Jonathan-Dobson

@Jonathan-Dobson here is a fix for that https://github.com/ml-explore/mlx-examples/pull/932. Will put it in a new pypi release once it lands.

Aug 13 '24 04:08 awni

@Jonathan-Dobson here is a fix for that #932. Will put it in a new pypi release once it lands.

#932 fixed the error and allows Fine Tuning to start now.

Aug 13 '24 04:08 Jonathan-Dobson

Given a `--fine-tune-type full` training and the saved model in adapters directory,

When attempting to use generate.py like this:

  python -m mlx_lm.generate \
  --model mlx-community/Qwen2-0.5B \
  --prompt $P \
  --adapter-path adapters

The command fails with this Error:

  File ".../venv/lib/python3.8/site-packages/mlx/nn/layers/base.py", line 204, in load_weights
    weights = list(mx.load(weights).items())
RuntimeError: [load_safetensors] Failed to open file adapters/adapters.safetensors

Here are the contents of adapters after the full fine tune with updated lora.py:

adapter_config.json
model.safetensors

It looks like running the full type saves the name as model now, but the generator.py still depends on the original adapters name.

Alternatively, running generate.py with `adapter/` path as the model still causes this error:

  File ".../llms/mlx_lm/utils.py", line 346, in load_config
    with open(model_path / "config.json", "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'adapters/config.json'

Aug 13 '24 05:08 Jonathan-Dobson

It’s not saving all the model files correctly, what I tried is dragging the original config and tokenizer files to this adapters folder and then generating without the —adapter flag (and adding the adapters path to the—model flag), because it’s saving the full model weights, this flag is only needed for LoRA fine tuning. I’ll push the fix later, thanks for the feedback. Try it again after renaming the model.safetensor to adapters.safetensor this should swap the old model weights with the new ones.

Aug 13 '24 07:08 Goekdeniz-Guelmez

Hey I'm back and it's fixed.

It now saves the full model with it's needed files like config.json, tokenizer, and so on.

You can now just generate with:

  python -m mlx_lm.generate \
  --model path/to/adapters \
  --prompt $P

Aug 13 '24 11:08 Goekdeniz-Guelmez

Hey @awni, I want to ask if I need to do or change something for it to be merged?

Sep 04 '24 20:09 Goekdeniz-Guelmez

Hey @awni, I want to ask if I need to do or change something for it to be merged?

Apologies for the delay. Let me take a look this week and get back to you!

Sep 04 '24 22:09 awni

@awni Thanks for the quick reply! removed unnecessary code, merged the lora and dora cases and added the new lora layers. Sorry for the bad code. I think the merge was not correctly.

Sep 05 '24 07:09 Goekdeniz-Guelmez

@awni @Goekdeniz-Guelmez it could be nice to add a FINETUNE.md to mlx-examples/llms/mlx_lm imho

Sep 30 '24 09:09 AtakanTekparmak

@awni @Goekdeniz-Guelmez it could be nice to add a FINETUNE.md to mlx-examples/llms/mlx_lm imho

A nice and detailed description on how to fine-tunine is already in the LORA.md file here. Do you mean there shoud be a seperate and more detailed explaination?

Sep 30 '24 09:09 Goekdeniz-Guelmez

@awni @Goekdeniz-Guelmez it could be nice to add a FINETUNE.md to mlx-examples/llms/mlx_lm imho

A nice and detailed description on how to fine-tunine is already in the LORA.md file here. Do you mean there shoud be a seperate and more detailed explaination?

Yeah I meant a separate file, I think full fine-tuning warrants its own guide as it's a feature long wanted for MLX. Great work btw 👏

Sep 30 '24 12:09 AtakanTekparmak

Adding full finetuning

Given a --fine-tune-type full training and the saved model in adapters directory,

When attempting to use generate.py like this:

The command fails with this Error:

Here are the contents of adapters after the full fine tune with updated lora.py:

Alternatively, running generate.py with adapter/ path as the model still causes this error:

Given a `--fine-tune-type full` training and the saved model in adapters directory,

Alternatively, running generate.py with `adapter/` path as the model still causes this error: