OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

Can I fine tune GPT-Neo-XT-Chat-Base-20B with 8 A100?

Open newcolour1994 opened this issue 2 years ago • 10 comments

Can you introduce the computing resources needed for the experiment

newcolour1994 avatar Mar 13 '23 09:03 newcolour1994

Similar question, how to infer the model on 4x V100?

zhongtao93 avatar Mar 13 '23 09:03 zhongtao93

Similar question, what is the minimum VRAM requirement to finetune the model? How about 4*4090?

encmps avatar Mar 14 '23 15:03 encmps

I would really like to know this too, it should probably be in the readme. I have 1 3090 to stand this up with before I can ask for more resources. If its really big, I might try to scale the model down and submit a request for a mini model to do sanity checks on local systems and such.

riatzukiza avatar Mar 16 '23 20:03 riatzukiza

Similar question, what is the minimum requirement to finetune the model if I want to add my own docs?

Southpika avatar Mar 17 '23 03:03 Southpika

We train this model on 8x A100 80GB GPUs. I'll update the README.

I... submit a request for a mini model to do sanity checks on local systems and such

This is a great idea! Will keep this issue open to track adding such a model.

csris avatar Mar 18 '23 06:03 csris

Can I train it on a single or fewer A100 80GB GPUs? Maybe it takes more time or it cannot run?

Southpika avatar Mar 18 '23 10:03 Southpika

Can I finetune the model on 8X V100 32GB GPUS with a smaller batch size?

puppet101 avatar Mar 23 '23 03:03 puppet101

Can I train it on a single or fewer A100 80GB GPUs? Maybe it takes more time or it cannot run?

up

raihan0824 avatar Mar 23 '23 17:03 raihan0824

We train this model on 8x A100 80GB GPUs. I'll update the README.

I... submit a request for a mini model to do sanity checks on local systems and such

This is a great idea! Will keep this issue open to track adding such a model.

how long it takes to train on 8*A100?

joydchh avatar Mar 24 '23 03:03 joydchh

About an hour per 100 steps. Usually, we fine-tune for a couple days.

csris avatar Mar 24 '23 03:03 csris