unsloth Titan X and Titan Xp support

I'm trying to finetune LLama 3 with the code sample provided in the notebook and installing using these instruction https://github.com/unslothai/unsloth/issues/73. Everything has gone smoothly but I'm running into this error:

LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32

Here's the full log

(base) kyle@lab:~/DDS$  cd /home/kyle/DDS ; /usr/bin/env /home/kyle/anaconda3/envs/llm_finetune/bin/python /home/kyle/.vscode-server/extensions/ms-python.python-2024.4.1/python_files/lib/python/debugpy/adapter/../../debugpy/launcher 53243 -- /home/kyle/DDS/llm_fine_tune.py --hoi_path /datasets/video --max_epochs 1 --split train --is_training 
Creating VidOR dataloader for train split
/home/kyle/anaconda3/envs/llm_finetune/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
==((====))==  Unsloth: Fast Llama patching release 2024.5
   \\   /|    GPU: NVIDIA TITAN Xp. Max memory: 11.91 GB. Platform = Linux.
O^O/ \_/ \    Pytorch: 2.3.0. CUDA = 6.1. CUDA Toolkit = 11.8.
\        /    Bfloat16 = TRUE. Xformers = 0.0.26.post1. FA = False.
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
Unsloth: unsloth/llama-3-8b-bnb-4bit has no tokenizer.model file.
Just informing you about this - this is not a critical error.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Unsloth 2024.5 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.
max_steps is given, it will override any value given in num_train_epochs
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs = 1
   \\   /|    Num examples = 2,009 | Num Epochs = 1
O^O/ \_/ \    Batch size per device = 2 | Gradient Accumulation steps = 4
\        /    Total batch size = 8 | Total steps = 60
 "-____-"     Number of trainable parameters = 41,943,040
  0%|                                                                                                                                               | 0/60 [00:00<?, ?it/s]
LLVM ERROR: Cannot select: intrinsic %llvm.nvvm.shfl.sync.bfly.i32

I tested this on a Titan X and Xp and received the same error. I'm thinking the CUDA version displayed is a little odd because I know the system I'm working on returns 11.5 when I run "nvcc --version".

May 16 '24 02:05 kylekam

may be the same issue as this? https://github.com/unslothai/unsloth/issues/309

May 16 '24 05:05 kylekam

Titan X and Xp came out in 2015 and 2017, respectively, which means they're too old.

May 17 '24 00:05 kylekam

Hmmm unsure actually - its possible you can try using an older Unsloth version which might work

May 17 '24 18:05 danielhanchen

But unsure - I remember some people managed to make it work

May 17 '24 18:05 danielhanchen

unsloth unsloth copied to clipboard

Titan X and Titan Xp support

unsloth
unsloth copied to clipboard