Philipp Schmid

Results 136 comments of Philipp Schmid

Hey @qmdnls, It could be very true what you say. I created the patch only for training, where you use gradient checkpointing and no cache. If you are interested in...

Can you please share the code you use to save the model?

Which GPU are you using? You need atleast 24GB. If you have that it might be possible that the "cell" where you load the model was run multiple times.

15GB of GPU ram is not enough to load the model in int8. Thats why you see the error. Yes you can adjust the example adjusting the `model_id`. you can...

Your model is small enough to fit on a single GPU. Deepspeed then does Data parallelism and runs more models. You should see a faster time to train

You can write a callback for the `Trainer` which is executed after an evaluation phase. https://huggingface.co/docs/transformers/main_classes/callback#transformers.TrainerCallback.on_evaluate

Hey @mallorbc, I needed to revert #30 since it broke the training for 7B and 13B i haven't had the chance to look at it again.

@mallorbc an nice! I try to make it compatible with both soonish. But we are also working on adding native support in `transformers` so in a few weeks not longer...

Try again with restarting the Kernel it seems you GPU is already busy

I kept its purpose separate if you want to "decouple" the training and processing parts.