unsloth
unsloth copied to clipboard
Titan X and Titan Xp support
I'm trying to finetune LLama 3 with the code sample provided in the notebook and installing using these instruction https://github.com/unslothai/unsloth/issues/73. Everything has gone smoothly but I'm running into this error:
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
Here's the full log
(base) kyle@lab:~/DDS$ cd /home/kyle/DDS ; /usr/bin/env /home/kyle/anaconda3/envs/llm_finetune/bin/python /home/kyle/.vscode-server/extensions/ms-python.python-2024.4.1/python_files/lib/python/debugpy/adapter/../../debugpy/launcher 53243 -- /home/kyle/DDS/llm_fine_tune.py --hoi_path /datasets/video --max_epochs 1 --split train --is_training
Creating VidOR dataloader for train split
/home/kyle/anaconda3/envs/llm_finetune/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
==((====))== Unsloth: Fast Llama patching release 2024.5
\\ /| GPU: NVIDIA TITAN Xp. Max memory: 11.91 GB. Platform = Linux.
O^O/ \_/ \ Pytorch: 2.3.0. CUDA = 6.1. CUDA Toolkit = 11.8.
\ / Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
"-____-" Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Unsloth: unsloth/llama-3-8b-bnb-4bit has no tokenizer.model file.
Just informing you about this - this is not a critical error.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
max_steps is given, it will override any value given in num_train_epochs
==((====))== Unsloth - 2x faster free finetuning | Num GPUs = 1
\\ /| Num examples = 2,009 | Num Epochs = 1
O^O/ \_/ \ Batch size per device = 2 | Gradient Accumulation steps = 4
\ / Total batch size = 8 | Total steps = 60
"-____-" Number of trainable parameters = 41,943,040
0%| | 0/60 [00:00<?, ?it/s]
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32
I tested this on a Titan X and Xp and received the same error. I'm thinking the CUDA version displayed is a little odd because I know the system I'm working on returns 11.5 when I run "nvcc --version".
may be the same issue as this? https://github.com/unslothai/unsloth/issues/309
Titan X and Xp came out in 2015 and 2017, respectively, which means they're too old.
Hmmm unsure actually - its possible you can try using an older Unsloth version which might work
But unsure - I remember some people managed to make it work