minimal-gpt-neox-20b
minimal-gpt-neox-20b copied to clipboard
How much RAM needed to run model?
Thanks for creating this implementation. I've tried running it in a Google Colab Pro notebook, but the session keeps crashing due to maxing out the RAM. Do you have any sense of how much RAM is needed to run the model? Thanks!
~40gb vram to load ~45gb to infer full 2048 token length. About the twice the amount in cpu RAM (~81gb) as it currently is intantiated on the cpu and then converted to fp16 and uploaded to VRAM. Dunno if that's intended behaviour, seems like it's supposed to create meta tensors, but thats not actually working.