llama2.mojo
llama2.mojo copied to clipboard
TODO: Support for gguf models
hey team, incredible work being done here.
Wondering if you only support .bin models, or would it also manage to work with gguf quantized models as well.
If not, then that's a real feature request. Mostly everyone uses gguf models to work nowadays, as they are easier to run on consumer-grade hardware.
thanks.