minimal-gpt-neox-20b icon indicating copy to clipboard operation
minimal-gpt-neox-20b copied to clipboard

How much RAM needed to run model?

Open texturejc opened this issue 2 years ago • 1 comments

Thanks for creating this implementation. I've tried running it in a Google Colab Pro notebook, but the session keeps crashing due to maxing out the RAM. Do you have any sense of how much RAM is needed to run the model? Thanks!

texturejc avatar Apr 06 '22 19:04 texturejc

~40gb vram to load ~45gb to infer full 2048 token length. About the twice the amount in cpu RAM (~81gb) as it currently is intantiated on the cpu and then converted to fp16 and uploaded to VRAM. Dunno if that's intended behaviour, seems like it's supposed to create meta tensors, but thats not actually working.

lopho avatar Apr 06 '22 19:04 lopho