YaLM-100B
YaLM-100B copied to clipboard
Provide pruned version for weaker hardware
It would be really useful to have a pruned version of the model (like Balaboba) to launch on weaker video card setups.
Also, quantization even to 4 bits may be possible, like it is successfully done for LLaMa. https://github.com/ggerganov/llama.cpp
+1 also this distribution technique might be very much applicable here: https://petals.ml