I have tryed all colabs exemples:

https://colab.research.google.com/drive/1j0N4XTY1zXXy7mPAhOC1_gMYZ2F2EBlk?usp=sharing https://colab.research.google.com/drive/1whHb54GNZMrNxIsi2wm2EY_-Pvo2QyKh?usp=sharing https://colab.research.google.com/drive/1K9ZrdwvZRE96qGkCq_e88FgV3MLnymQq?usp=sharing

They all crash the session for memory problem when I try to merge models at this line:

Save locally to 16bit

model.save_pretrained_merged("unsloth_finetune", tokenizer,)

I also tryed locally with 24 GB VRAM 3090 RTX and getting the same memory crash:


 warnings.warn("Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.")
{'train_runtime': 162.7664, 'train_samples_per_second': 1.475, 'train_steps_per_second': 0.184, 'train_loss': 1.351119724412759, 'epoch': 0.15}
100%|████████████████████████████████████████████████████████████████████████████████████████| 30/30 [02:42<00:00,  5.43s/it]/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/peft/utils/save_and_load.py:230: UserWarning: Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.
  warnings.warn("Setting `save_embedding_layers` to `True` as embedding layers found in `target_modules`.")
model-00001-of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.97G/4.97G [18:40<00:00, 4.43MB/s]model-00002-of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.99G/4.99G [18:12<00:00, 4.57MB/s]model-00003-of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 4.93G/4.93G [17:59<00:00, 4.57MB/s]model-00004-of-00004.safetensors: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 1.69G/1.69G [06:27<00:00, 4.37MB/s]Unsloth: Merging weights into 16bit:   0%|                                                                                                                                                                                                         | 0/4 [00:31<?, ?it/s]Traceback (most recent call last):
  File "/mnt/e/Projects/ai_recognition/unsloth_finetune.py", line 241, in <module>
    model.save_pretrained_merged(output_path, tokenizer)  
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 2185, in unsloth_generic_save_pretrained_merged
    unsloth_generic_save(**arguments)
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth/save.py", line 2132, in unsloth_generic_save
    merge_and_overwrite_lora(
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth_zoo/saving_utils.py", line 581, in merge_and_overwrite_lora
    n_saved_modules += _merge_and_overwrite_lora(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/miniconda3/envs/unsloth_env/lib/python3.11/site-packages/unsloth_zoo/saving_utils.py", line 328, in _merge_and_overwrite_lora
    W = W.to(device = "cpu", dtype = output_dtype, non_blocking = True)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

Jan 28 '25 07:01 alkollo

Oh that's not good hmm I'll have to auto check the memory usage before merging

Jan 28 '25 11:01 danielhanchen

Oh that's not good hmm I'll have to auto check the memory usage before merging

Let me know if you want me to test fixes. I have time. Thank you.

Jan 28 '25 18:01 alkollo

After fine-tuning the visual model, it cannot be saved locally. What could be the issue? It shows that the original model cannot be found.

Feb 23 '25 07:02 jia-zhen-yu

Hello ,

This should now be resolved with the latest unsloth version as we recently pushed a new save and merge logic for both vision and language only models. Please make sure to upgrade your installation to the latest version using

pip install --upgrade unsloth-zoo
pip install --upgrade unsloth

Will close this for now. Please feel free to comment back if you are still facing the same issue after updating your version of unsloth. Thank you for using Unsloth

Jun 29 '25 23:06 rolandtannous

unsloth
unsloth copied to clipboard

Unsloth vision models merging crashes

Save locally to 16bit

unsloth unsloth copied to clipboard

Unsloth vision models merging crashes

Save locally to 16bit

unsloth
unsloth copied to clipboard