abpani

Results 23 comments of abpani

Even I tried with deepspeed and accelerate I cant do more than batch size of 1.

![Screenshot 2024-09-19 at 9 24 28 AM](https://github.com/user-attachments/assets/b02fc686-ca5b-48d9-9e17-14c4e91fbeef) Now I can do batch size of 2 but the last gpu is almost full. It may give oom on a specific sample...

You suggested to try deepspeed. I tried but my acceleartor.process_index shows only 0 gpu so the model gets loaded to GPU 0 only. then it raises OOM error with batch...

# This is the traceback when I do batchsize 3. Traceback (most recent call last): File "/home/ubuntu/abpani/FundName/llama3_8b_qlora-bnb.py", line 94, in trainer.train() File "/home/ubuntu/abpani/FundName/myenv/lib/python3.10/site-packages/trl/trainer/sft_trainer.py", line 450, in train output = super().train(*args,...

Now I just tried with Mistral v0.3 Instruct with per_device_train_batch_size = 6, per_device_eval_batch_size = 6, gradient_accumulation_steps = 8, ![Screenshot 2024-09-19 at 10 17 32 AM](https://github.com/user-attachments/assets/748e0933-5fcc-4dc0-9cf6-47d23afcbcd6) It is working great.

@qgallouedec I am trying with 8192 context length. I tried installing a lot of package combinations still same issue using models with large vocab size.

@qgallouedec Still the issue persists with Qwen2.5 3B model cant even go more than 1 batch size for bitsandbytes 4bit context_length = 5192 I am unable to think why you...

still same issue. it shows different errors like loaded in different devices . cuda 0 and cuda 1 device_map = {'model.embed_tokens': 0, 'model.layers.0': 0, 'model.layers.1': 0, 'model.layers.2': 0, 'model.layers.3': 0,...

@LysandreJik You can find the details about device map here https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/blob/main/model.safetensors.index.json

@LysandreJik I tried as you suggested still same issue in a multi gpu environment.