Pierrick Hymbert

Results 16 comments of Pierrick Hymbert

Hello, I submitted a new feature to support java8 streams, it will allow to handle response body asynchronously or without waiting for the full body before to process. Although request...

Deepseek models support is in progress: - https://github.com/ggerganov/llama.cpp/pull/5464#issuecomment-1974818993 - #5981 - #6252

@userbox020 it looks `Q8_0` quantization is not supported: https://github.com/ggerganov/llama.cpp/blob/8f1be0d42f23016cb6819dbae01126699c4bd9bc/llama.cpp#L4488-L4502 You might notice with `openhermes-2.5-neural-chat-v3-3-slerp.Q8_0.gguf`: ``` llama_model_load: disabling Kompute due to unsupported model arch or quantization ``` Tested `openhermes-2.5-neural-chat-v3-3-slerp.Q4_0.gguf` with `NVIDIA...

Thanks for the effort to bring this nice feature :1st_place_medal: . Please mind to push commits on your fork first as it triggers lot of CI runs on the main...

> @phymbert sorry for the CI noise again today, wanted to get the PR in good working order. Please forgive my comment, firstly because I am sure you do your...

Hello, which model architecture? Please share the steps to reproduce your issue

> might you have some guidance here? shall i add a sample shell script or extend the python test suite? I suggest adding a simple dedicated scenario in a new...

> One way to improve this even further and help new contributors to implement tests, is to reference a very small PR that introduces a basic server test, without any...

For those struggling to find FTW is CANN : https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_10_0239.html Great!

~~@ggerganov please restart the CI github manager for benchmark~~ EDIT: the job just failed: https://github.com/ggerganov/llama.cpp/actions/runs/8753504588/job/24040839682?pr=6766