unsloth icon indicating copy to clipboard operation
unsloth copied to clipboard

Kaggle Notebook fails to train Llama3 on T4 GPU

Open raj-chinagundi opened this issue 9 months ago • 6 comments

I used the kaggle guide book in the readme, and executed it step by step on T4x2 GPU accelerator on Kaggle. But turns out my SFT Trainer detects two GPUs while training, but the tensors should be on the same device. Hence I factory reset the system and then force to use only one t4 GPU instead Screenshot 2024-05-06 at 3 45 11 AM Screenshot 2024-05-06 at 3 46 21 AM In the screenshot you can see now the session has only single GPU, so I thought we were good to go. But guess what, then I ran into this issue Screenshot 2024-05-06 at 3 48 11 AM

I have exhausted my free resources on colab so I came on kaggle, but looks like i am having a really hard time just to fine tune a LLM which was supposed to be the easy part using unsloth. I kindly request assistance i setting up Kaggle notebook which runs end to end smoothly.

raj-chinagundi avatar May 05 '24 22:05 raj-chinagundi

I then updated my torch to 2.3.0 but then it ran into conflicts with other dependencies. Screenshot 2024-05-06 at 3 53 00 AM

raj-chinagundi avatar May 05 '24 22:05 raj-chinagundi

Currently we do not support multi GPU - use our Kaggle Llama-3 notebook: https://www.kaggle.com/code/danielhanchen/kaggle-llama-3-8b-unsloth-notebook as is - it does not work on 2x T4s

danielhanchen avatar May 08 '24 18:05 danielhanchen

Screenshot 2024-05-09 at 7 10 06 AM

I know you don't support gpu t4 2x, there is no option for t4 1x, it fails on p100. don't know what next to do, please assist. @danielhanchen

raj-chinagundi avatar May 09 '24 01:05 raj-chinagundi

Oh 2x Tesla T4 is the option!

danielhanchen avatar May 09 '24 17:05 danielhanchen

Just select it

danielhanchen avatar May 09 '24 17:05 danielhanchen

Its not working on 2x Tesla T4 that's why I created this issue

raj-chinagundi avatar May 09 '24 20:05 raj-chinagundi