Jesse comments

Results 99 comments of


                                            Jesse

[Bug] deepseek v3/r1 with full context with balance_serve backend

@magikRUKKOLA Sorry I didn't get to it last night. I tried to apply these diffs this morning and my patch tool is saying: ``` patch: **** Only garbage was found...

[Bug] deepseek v3/r1 with full context with balance_serve backend

Ah. There are mixed tab and space characters in those files and therefore the diffs that are making them hard to apply - probably because github or markdown is stripping...

[Bug] deepseek v3/r1 with full context with balance_serve backend

Tested with: ```bash python ktransformers/server/main.py \ --port 11434 \ --model_path /data/DeepSeek-V3 \ --model_name "DeepSeek-V3-0324:671b-q4_k_xl" \ --gguf_path /data/DeepSeek-V3-0324-GGUF-UD/UD-Q4_K_XL \ --optimize_config_path /home/jesse/ktransformers/ktransformers/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml \ --temperature 0.3 \ --cpu_infer 30 \ --cache_lens 131072 \...

[Bug] deepseek v3/r1 with full context with balance_serve backend

> I am pretty sure that the problem is similar to what is going on with balance_serve and R1/V3 can fix it. But should we do that? The thing is...

[Bug] deepseek v3/r1 with full context with balance_serve backend

> Lets see if the authors will merge the PR you made. If not, I am forking it lol and doing everything properly. I wish I knew enough about LLM...

[Bug] deepseek v3/r1 with full context with balance_serve backend

`llama.cpp` is faster for me in tok/s than `ktransformers`, due to the way it handles NUMA. I can run it with NPS4 and tune the number of CPU threads to...

[Authorization] Support token exchange protocol for clients that do not interact with the user

Yeah, so `client` might be a better name for the `proxy`?

Feature Request: Kimi-K2-Thinking reasoning and tool calling support

Hey @KiruyaMomochi. I wrote the latest iteration of the DS 3.1 tool calling code. I wrote a ton of unit tests. Strongly recommend writing your own in the same style...

Feature Request: Kimi-K2-Thinking reasoning and tool calling support

https://github.com/ggml-org/llama.cpp/pull/16932 sort of works, but in my testing with Open Hands it keeps stopping for some reason. I have to type "continue" constantly and it gets stuck in repetitive loops....