Endre Stølsvik comments

Results 26 comments of


                                            Endre Stølsvik

Dl4J memory page: review + improve docs/examples

Just kick pastes from Gitter, sincefamily calls: Point is, if you have (as per your example) -Xmx2G maxbytes=8G and maxphysicalbytes=8G, then you ACTUALLY only have 6GB available for off-heap. Because...

Dl4J memory page: review + improve docs/examples

So, basically, how this works out for dl4j: You need to set maxphysicalbytes to basically the highest number you want the process to take, including all three of the JVM...

llava not using correct system prompt and/or settings

I believe this goes for every model, and it is clearly a bit problematic. That is, the prompt template (and "history template") of the "chat interfaces" with the different models...

llava not using correct system prompt and/or settings

Oh, I guess this is exactly what https://github.com/Mozilla-Ocho/llamafile/issues/65 is about. Pointing to this blogpost: https://huggingface.co/blog/chat-templates

Is it possible to add an OpenAI-compatible embedding endpoint?

Definitely would be a welcome addition, yes! :+1: Edit: There is already an issue in 'llama.cpp' about lots of features that should go into the server, I added a comment...

Is it possible to add an OpenAI-compatible embedding endpoint?

The OpenAI-compatible embeddings-endpoint is directly mentioned here, I realize: https://github.com/ggerganov/llama.cpp/issues/4216#issuecomment-1858542650

"MatsInitiator implements StartStoppable"? Functionality for "back pressure" - thoughts.

(Funny, that - exactly one year later, I am again thinking about this, about to create an issue, but the "The following issues might be related." caught me.) I think...

Hand-crafted binary serialization and deserialization of MatsTrace (or "direct lookup" into byte array)

Judging by serialization/deserialization times, and compress/decompress times (once I made these timings available!), this won't shave more than at most a few milliseconds off the total processing time for a...

Hand-crafted binary serialization and deserialization of MatsTrace (or "direct lookup" into byte array)

Interesting blog post about UTF-8 encoding: http://psy-lob-saw.blogspot.no/2012/12/encode-utf-8-string-to-bytebuffer-faster.html

Let context/initiate.log[Timing]Measurement(...) be available from ContextLocal.

I had evidently forgotten this case when making #45, which is identical - and closed by https://github.com/centiservice/mats3/commit/84ba5747efe4c5dacc01098601f2641a2c196831 So, I'll reuse this for what I had forgotten there: Make it available...