Philipp Schmid comments

Results 136 comments of


                                            Philipp Schmid

Llama patch for FlashAttention support fails with use_cache

Hey @qmdnls, It could be very true what you say. I created the patch only for training, where you use gradient checkpointing and no cache. If you are interested in...

Save the model in h5

Can you please share the code you use to save the model?

ValueError

Which GPU are you using? You need atleast 24GB. If you have that it might be possible that the "cell" where you load the model was run multiple times.

ValueError

15GB of GPU ram is not enough to load the model in int8. Thats why you see the error. Yes you can adjust the example adjusting the `model_id`. you can...

Does deepspeed partition the model to multi GPUs?

Your model is small enough to fit on a single GPU. Deepspeed then does Data parallelism and runs more models. You should see a faster time to train

Compute metrics while using SFT trainer

You can write a callback for the `Trainer` which is executed after an evaluation phase. https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback.on_evaluate

LLama 2 Flash Attention Patch Not Working For 70B

Hey @mallorbc, I needed to revert #30 since it broke the training for 7B and 13B i haven't had the chance to look at it again.

LLama 2 Flash Attention Patch Not Working For 70B

@mallorbc an nice! I try to make it compatible with both soonish. But we are also working on adding native support in `transformers` so in a few weeks not longer...

Re. fine-tune-llms-in-2024-with-trl.ipynb

Try again with restarting the Kernel it seems you GPU is already busy

Add tokenizer to Trainer object

I kept its purpose separate if you want to "decouple" the training and processing parts.