OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

Does it run on single nVidia RTX A4000?

Open exander77 opened this issue 1 year ago • 7 comments

Does it run on single nVidia RTX A4000 or do I need two or more?

exander77 avatar Mar 21 '23 11:03 exander77

It looks like, as of right now, you need at least 48GB RAM like an A100 80GB. There are people on the Discord server who have managed to run it on smaller GPUs by using multiple GPUs or on smaller GPUs by using quantization.

I would check out the Discord for more info/help

orangetin avatar Mar 21 '23 23:03 orangetin

@exander77 you can find the Multi GPU discord thread here: https://discord.com/channels/1082503318624022589/1082510608123056158/1084210191635058759

amaliako avatar Mar 28 '23 14:03 amaliako

I've had marginal luck using a 4090 with 24gb of ram. The trick was to not give it ALL of your memory because it will need some for data load and some for the processing. Quantization helped some too.

joecodecreations avatar Mar 31 '23 06:03 joecodecreations

image Some output from the 4090

joecodecreations avatar Mar 31 '23 06:03 joecodecreations

I've had marginal luck using a 4090 with 24gb of ram. The trick was to not give it ALL of your memory because it will need some for data load and some for the processing. Quantization helped some too.

Yup. This issue was before the Pythia base model was out. This should run on GPUs >12 GB VRAM or <12 GB VRAM with offloading to CPU/disk.

The GPT-NeoX-20B model still requires 40 GB of memory to be loaded.

This issue can be closed. Yes, the Pythia model can run inference on a single Nvidia RTX A4000.

orangetin avatar Mar 31 '23 13:03 orangetin

I've had marginal luck using a 4090 with 24gb of ram. The trick was to not give it ALL of your memory because it will need some for data load and some for the processing. Quantization helped some too.

Yup. This issue was before the Pythia base model was out. This should run on GPUs >12 GB VRAM or <12 GB VRAM with offloading to CPU/disk.

The GPT-NeoX-20B model still requires 40 GB of memory to be loaded.

This issue can be closed. Yes, the Pythia model can run inference on a single Nvidia RTX A4000.

I can confirm the Pythia model works on single Nvidia RTX A4000. I am figuring what I need for: GPT-NeoX-20B

GPU VRAM usage with Pythia model: 14595MiB / 16376MiB

exander77 avatar Apr 04 '23 10:04 exander77

I can confirm the Pythia model works on single Nvidia RTX A4000. I am figuring what I need for: GPT-NeoX-20B

GPU VRAM usage with Pythia model: 14595MiB / 16376MiB

That's great! You'll either need >40 GB VRAM (~20 in 8-bit) for the GPT-NeoX-20B model, or use cpu offloading to run it on your A4000 by adding the flag -g 0:14 (there's a PR up that should let you increase the 14 to 16).

orangetin avatar Apr 04 '23 14:04 orangetin