OpenChatKit
OpenChatKit copied to clipboard
Is it possible to run the system on Google Colab ?
Is it possible to reduce the amount of resources needed to run the system on Google Colab ? because not everyone has the means to experiment with A100 80gb
+1
Google Colab Pro offers A100 40GB with 40GB RAM when using high-ram IIRC.
You'd need to try it on Google Colab Pro+; it may or may not have enough resources. While I have not tested it on a Google Colab Pro+ account, I can confirm that it does NOT run on Google Colab Pro due to insufficient resources.
Google Colab Pro Specs used:
- NVIDIA A100-SXM4-40GB
- System RAM = 83.5 GB (I believe this is RAM + GPU)
Here's the code for the Colab ipynb if you wanted to test it out for yourself or if anyone else has access to Pro+.
A free Google Colab will definitely not be sufficient.
I tried with Collab free:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.75 GiB total capacity; 13.79 GiB already allocated; 2.81 MiB free; 13.93 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@husnoo yeah, the models are too big to be loaded onto a free account without 8-bit. Can you try this instead? : https://colab.research.google.com/github/orangetin/OpenChatKit/blob/colab-example/inference/example/example.ipynb
@orangetin I can confirm that the inference of togethercomputer/Pythia-Chat-Base-7B works on Google Colab Pro +, but not the togethercomputer/GPT-NeoXT-Chat-Base-20B (this model can get loaded but consumes 39.4 GB of vRAM and thus crashes when an inference is made)
and it seems to consume only 14.8 vRAM even with the full version (non 8 bits)
The thing that I am now that I'm trying to solve, if anyone has hints about that, is if it is possible to fine-tune the model directly in colab with a single 40gb GPU. It looks like it has been designed specifically for multi-GPU but maybe the code can be tweaked.
@leclem Is Colab Pro+ worth compare to Colab Pro to finetuning and running inference on LLM models ?
@thomasjv799
No difference for the moment unfortunately
A major hardware limitation for playing with LLMs is the vRAM, the memory of the GPU, on which the model needs to be loaded in order for the GPU to perform operations. Because LLMs are big, they need GPUs with a lot of vRAM.
For the moment, the best GPU you can get on google colab pro or pro+ is an NVIDIA A100 with 40gb vRAM, which is limiting in case of finetuning LLMs. For inference, you can do it if you quantize to 4 bits (or 8 bits for the small ones). Colab pro+ will give you more credits than Colab pro, so more time to use the A100, and prioritary access to it, but will not allow you to have more vRam.
Nevertheless, it looks like supporting 80gb vRam GPUs for colab pro+ is a feature request they have in their roadmap https://github.com/googlecolab/colabtools/issues/3784 So the answer for the moment is no, but when they add those GPUs it will be yes.
For the moment, I have been using https://www.runpod.io/ which provide an interface similar to colab and supports A100 80gb (a version of A100 with 80gb vRam) as well as the NVIDIA H100 that has 80gb vRam, and it works well. Also, it supports multiple GPUs attached to a machine, which is a prerequisite for the kind of finetunings done by openchatkit (that finetunes the whole model and not just a part of it) that require more than 80gb of vRAM and are thus distributed over multiple GPUs. https://www.runpod.io/gpu-instance/pricing