unsloth CUDA_VISIBILE_DEVICES not functioning

I saw error message when I am trying to do supervised fine tuning with 4xA100 GPUs. So the free version cannot be used on multiple GPUs?

RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Jun 18 '24 04:06 xtchen96

Oh currently Unsloth does not support multi GPU sorry - our enterprise plans have them for now - we're currently concentrated on adding Ollama, Llama-3 bug fixes, all model support and more in the OSS

Jun 18 '24 05:06 danielhanchen

@danielhanchen Is there a way to run Unsloth on only 1 GPU when I have a 2 GPU Node? I get the same error and I want to use only 1 GPU as the model easily fits on it? I tried

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

But it did not work

Jun 18 '24 08:06 aflah02

Export it via shell before running the python script

Jun 18 '24 15:06 ewre324

Yep you have to set the env variable before running Unsloth

Jun 19 '24 10:06 danielhanchen

Setting the env variable before running Unsloth still does not resolve the problem.

Used: export CUDA_VISIBLE_DEVICES=0 but it still comes up with the error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Also used: export CUDA_VISIBLE_DEVICES=1 but same problem.

Jun 19 '24 13:06 miary

@danielhanchen I am confused Kindly help. The error is asking to get a commercial licence.

@miary what GPUs are you using and are they already running another job?

Jun 19 '24 13:06 ewre324

@danielhanchen I have 2 GPUs, both RTX 3090. This runtime error about more than 1 GPU is a brand new issue that came from Unsloth 2024.6.

I have a project that is using Unsloth 2024.5 and it works just fine.

It is completely fine is Unsloth wants to charge for environment with more than one GPU. However, the option should be given to use only one GPU, which is what setting the CUDA_VISIBLE_DEVICES env is supposed to do, but it's apparently broken. Looks like a really bad bug because it breaks the entire project.

Jun 19 '24 18:06 miary

Hmm I shall investigate this hmmm.

How do you all call Unsloth? Via the terminal as a python script? Via Jupyter?

Jun 20 '24 13:06 danielhanchen

I am using python script and had the same issue while trying to run on GPU 1 (if i set the code to have visibility only on GPU 0 it works fine).

I am using this as the first lines in my main code:

import os
os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"] = "1"
os.environ["GRADIO_SHARE"]="1"
os.environ["WORLD_SIZE"] = "1"

Jun 20 '24 13:06 Chirobocea

@Chirobocea So do you use python train.py or like torchrun?

Jun 20 '24 13:06 danielhanchen

Usually I use python train.py. However, I just tried to lunch it with torchrun and it has same issue. Also I checked with the debugger that torch indeed sees only one gpu, which is renamed for the running code to id 0, while in the loading process of the model it takes VRAM only from GPU 1 as expected (from nvidia-smi).

Jun 20 '24 13:06 Chirobocea

Ok thanks for the info! Running in runpod to see what I can do! :)

Jun 20 '24 13:06 danielhanchen

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Jun 20 '24 14:06 danielhanchen

RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Thanks for all your work, btw! Killer project!

==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1,029 | Num Epochs = 1 O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040 Traceback (most recent call last): File "/home/matto/projects/baby-code/workspace/unsloth-orpo.py", line 128, in orpo_trainer.train() File "/home/matto/miniconda3/envs/unsloth_env/lib/python3.10/site-packages/transformers/trainer.py", line 1885, in train return inner_training_loop( File "", line 226, in _fast_inner_training_loop RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Jun 20 '24 14:06 molander

Can confirm it does not occur with unsloth-2024.5 but does at unsloth-2024.6 If necessary, one can downgrade via: pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git@f9689b1

Jun 20 '24 15:06 molander

@molander Do you know if my latest fix fixes stuff?

Jun 20 '24 15:06 danielhanchen

@danielhanchen no, as soon as I uninstall and pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git, it comes back :(

Jun 20 '24 15:06 molander

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via
pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

This patch did not solve the problem. Same error: RuntimeError: Error: More than 1 GPUs have a lot of VRAM usage. Please obtain a commercial license.

Jun 20 '24 17:06 miary

Hmmm weird I tried it in runpod with 4x GPUs and it worked - I shall re try fixing this! Sorry everyone on the issue!

Jun 20 '24 19:06 danielhanchen

@miary @molander I updated the package again! Apologies on the issues!

I found the below to work (change 1 to any device id)

export CUDA_VISIBLE_DEVICES=1 && python train_file.py

Likewise torchrun also works with that approach.

Hope this works! Thank you for your patience!

Jun 21 '24 05:06 danielhanchen

@miary @Chirobocea @aflah02 Just fixed it! Hopefully it now can work! Apologies on the issues! Please update Unsloth via
pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

I tried this one hour ago and checked. It seams that the main problem is when I have something running on other GPU as well. For example if i have another code with another env on GPU 0, I can't run unsloth on GPU 1. The error is the same as before.

Jun 21 '24 08:06 Chirobocea

Confirmed not working as intended. Nothing going on GPU = 1, will not run, even though Num GPUs shows as 1 in Unsloth banner below.

But on GPU = 0, after I closed everything but Xorg, it worked.

So, it would appear at first glance, it must use GPU0 in which case, you have a legitimate workaround, ticket closed, back to the real work ;)

Thank you for open-sourcing. I know that it takes big balls, and I assure you, it's worth it all the way around ;)

max_steps is given, it will override any value given in num_train_epochs ==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1 \ /| Num examples = 1,029 | Num Epochs = 1 O^O/ _/ \ Batch size per device = 2 | Gradient Accumulation steps = 4 \ / Total batch size = 8 | Total steps = 30 "-____-" Number of trainable parameters = 41,943,040

Jun 21 '24 11:06 molander

@miary @molander I updated the package again! Apologies on the issues!

I found the below to work (change 1 to any device id)
export CUDA_VISIBLE_DEVICES=1 && python train_file.py
Likewise torchrun also works with that approach.

Hope this works! Thank you for your patience!

@danielhanchen Just wanted to confirm that your patch by including export CUDA_VISIBLE_DEVICES=1 works!!! Thanks for all the good work, greatly appreciated!

Jun 21 '24 13:06 miary

@miary Great it worked!

@molander Thanks glad it's a workaround - I'll see what I can do. So export CUDA_VISIBLE_DEVICES=1 && python train_file.py still does not work? Do you use torchrun or python or accelerate?

Jun 21 '24 13:06 danielhanchen

@danielhanchen Good to go here! I made a new conda env and conda installed pytorch, transformers, etc and it's working like a mule at the grand canyon! Thanks!

Jun 21 '24 15:06 molander

Thanks @danielhanchen!!

Jun 21 '24 17:06 aflah02

This still isn't working for me. @danielhanchen can you please remove the exception if more than one GPU has 4GB of memory usage?

@molander Did you do anything custom on your end? Looking at the main branch, the code is still there.

Jun 25 '24 16:06 user799595

I'm on a node with multiple GPUs, but I only have one in CUDA_VISIBLE_DEVICES.

The issue I'm having is with these lines in the patch_sft_trainer_tokenizer() function of tokenizer_utils.py: https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/tokenizer_utils.py#L961-L970

The check for multiple GPUs here is really a count of how many GPUs on the node are using > 4gb of memory. This is going to fail for anyone on a busy shared node.

I removed that check, and a similar check in llama.py: https://github.com/unslothai/unsloth/blob/933d9fe2cb2459f949ee2250e90a5b610d277eab/unsloth/models/llama.py#L1198-L1207

Then I was able to run unsloth on my node.

Jun 25 '24 23:06 vvatter

Much much apologies on the delay! My brother and I just relocated to SF, so just got back to Github issues!

As per the discussion here, I will instead convert it to a warning for people to say Unsloth is not yet functional for multi GPUs, and will still allow the finetuning process to go through (esp for shared servers)

Jul 01 '24 00:07 danielhanchen

As requested, I made it into a warning instead and not an error :) Please update Unsloth and try it out! Hope it works now!

pip uninstall unsloth -y
pip install --upgrade --force-reinstall --no-cache-dir git+https://github.com/unslothai/unsloth.git

Jul 04 '24 06:07 danielhanchen

unsloth unsloth copied to clipboard

CUDA_VISIBILE_DEVICES not functioning

unsloth
unsloth copied to clipboard