Jake Luciani comments

Results 96 comments of


                                            Jake Luciani

Use Memory Segment API for aligned vector loads.

Hey @jatin-bhateja thanks for taking a look! So looks like the ValueLayout isn't aligned and allocateDirect is? Am I reading it right?

Use Memory Segment API for aligned vector loads.

Hi @jatin-bhateja I was able to reproduce the split-load drop with aligned memory but I don't see a 15% bump. I only see a ~5% improvement over arrays. Any idea...

Multimodal support?

Not yet. But I'll work on adding Multimodal inputs

streaming server support?

Probably, here's the current API call for chat https://github.com/tjake/Jlama/blob/main/jlama-cli/src/main/java/com/github/tjake/jlama/cli/serve/GenerateResource.java

GGUF Support

I could support some of the quantization types. Is that the main reason vs safetensors?

GGUF Support

Hmm can you give me an example? The GGUF and Safetensor of the same model with same quantization is pretty much the same. Maybe they changed GGUF since I last...

GGUF Support

Jlama does the downloading for you. It only needs 4 of the files

GGUF Support

If there are models you would like me to quantize and upload please request here https://github.com/tjake/Jlama/discussions/37

GGUF Support

Hi, Yeah I saw that and will consider adding it but there's a couple issues. Since this is a solo project I need to weigh the burden of supporting both....

Using the Chat UI, pressing stop throws a lot of exceptions in the console

Yes Ive noticed this, just need to check if Sse emitter is closed