ggml
ggml copied to clipboard
[Feature request] Implement 8-bit GPT-J
Results in ~11Gb weights vs. 16Gb, implemented in PyTorch now as load_in_8bit=True:
https://huggingface.co/hivemind/gpt-j-6B-8bit