Jake Luciani
Jake Luciani
Hey @jatin-bhateja thanks for taking a look! So looks like the ValueLayout isn't aligned and allocateDirect is? Am I reading it right?
Hi @jatin-bhateja I was able to reproduce the split-load drop with aligned memory but I don't see a 15% bump. I only see a ~5% improvement over arrays. Any idea...
Not yet. But I'll work on adding Multimodal inputs
Probably, here's the current API call for chat https://github.com/tjake/Jlama/blob/main/jlama-cli/src/main/java/com/github/tjake/jlama/cli/serve/GenerateResource.java
I could support some of the quantization types. Is that the main reason vs safetensors?
Hmm can you give me an example? The GGUF and Safetensor of the same model with same quantization is pretty much the same. Maybe they changed GGUF since I last...
Jlama does the downloading for you. It only needs 4 of the files
If there are models you would like me to quantize and upload please request here https://github.com/tjake/Jlama/discussions/37
Hi, Yeah I saw that and will consider adding it but there's a couple issues. Since this is a solo project I need to weigh the burden of supporting both....
Yes Ive noticed this, just need to check if Sse emitter is closed