Pierrick Hymbert comments

Results 83 comments of


                                            Pierrick Hymbert

Non-blocking I/O support

Hello, I submitted a new feature to support java8 streams, it will allow to handle response body asynchronously or without waiting for the full body before to process. Although request...

can llama.cpp/convert.py support tokenizer rather than 'spm', 'bpe', 'hfft'

Deepseek models support is in progress: - https://github.com/ggerganov/llama.cpp/pull/5464#issuecomment-1974818993 - #5981 - #6252

Kumpute not offloading any gpu layers

@userbox020 it looks `Q8_0` quantization is not supported: https://github.com/ggerganov/llama.cpp/blob/8f1be0d42f23016cb6819dbae01126699c4bd9bc/llama.cpp#L4488-L4502 You might notice with `openhermes-2.5-neural-chat-v3-3-slerp.Q8_0.gguf`: ``` llama_model_load: disabling Kompute due to unsupported model arch or quantization ``` Tested `openhermes-2.5-neural-chat-v3-3-slerp.Q4_0.gguf` with `NVIDIA...

[WIP] agent example (w/ sandboxable Tools!) & improved OAI compatibility layer (in Python)

Thanks for the effort to bring this nice feature :1st_place_medal: . Please mind to push commits on your fork first as it triggers lot of CI runs on the main...

[WIP] agent example (w/ sandboxable Tools!) & improved OAI compatibility layer (in Python)

> @phymbert sorry for the CI noise again today, wanted to get the PR in good working order. Please forgive my comment, firstly because I am sure you do your...

AttributeError: 'GGUFWriter' object has no attribute 'add_vocab_size'

Hello, which model architecture? Please share the steps to reproduce your issue

Server: Unix Socket Support

> might you have some guidance here? shall i add a sample shell script or extend the python test suite? I suggest adding a simple dedicated scenario in a new...

Server: Unix Socket Support

> One way to improve this even further and help new contributors to implement tests, is to reference a very small PR that introduces a basic server test, without any...

[CANN] Add Ascend NPU backend

For those struggling to find FTW is CANN : https://support.huaweicloud.com/intl/en-us/usermanual-cce/cce_10_0239.html Great!

DRAFT: Introduction of CUDA Graphs to LLama.cpp

~~@ggerganov please restart the CI github manager for benchmark~~ EDIT: the job just failed: https://github.com/ggerganov/llama.cpp/actions/runs/8753504588/job/24040839682?pr=6766