jetson-containers icon indicating copy to clipboard operation
jetson-containers copied to clipboard

llama finetuning - bitsandbytes

Open atadria opened this issue 1 year ago • 1 comments

Hi, I tried to finetune llama on Jetson AGX Orin Developer Kit 64GB (JetPack 5.1.2, L4T R35.4.1). I used as base image: dustynv/pytorch:2.1-r35.4.1 After model is downloaded from huggingface-hub and device map is set to cuda device 0 - {"":0} I get this error:

Error invalid device ordinal at line 359 in file /opt/bitsandbytes/csrc/pythonInterface.c
/arrow/cpp/src/arrow/filesystem/s3fs.cc:2829:  arrow::fs::FinalizeS3 was not called even though S3 was initialized.  This could lead to a segmentation fault at exit

I'm looking for suggestions on how to resolve this issue. Has anyone successfully fine-tuned the Llama model on a Jetson AGX Orin Developer Kit and can share their code or insights?

atadria avatar Oct 04 '23 09:10 atadria

Hi @atadria, I haven't tried fine-tuning on Jetson. Are you actually using the dustynv/transformers container (which has bitsandbytes built). You could just try disabling bitsandbytes (either when you call transformers API, or by changing the transformers dockerfile so it doesn't rely on bitsandbytes). With AutoGPTQ integration into HF, bitsandbytes probably gets used less and less (it's slow anyways)

Also, Apache arrow is a huge project and difficult to assertain what the issue may be, or if it needs built from source. There are some dockerfiles in this repo that build it (under packages/rapids IIRC)

Also, I'm curious which model you are attempting to fine-tuning and what the memory needed may be? I hadn't known if it was feasible to fine-tune on the device or not. If so, that would be great!

FWIW, I think you can also do fine-tuning through oogabooga if that helps...

dusty-nv avatar Oct 04 '23 13:10 dusty-nv