Jesse
Jesse
@magikRUKKOLA Sorry I didn't get to it last night. I tried to apply these diffs this morning and my patch tool is saying: ``` patch: **** Only garbage was found...
Ah. There are mixed tab and space characters in those files and therefore the diffs that are making them hard to apply - probably because github or markdown is stripping...
Tested with: ```bash python ktransformers/server/main.py \ --port 11434 \ --model_path /data/DeepSeek-V3 \ --model_name "DeepSeek-V3-0324:671b-q4_k_xl" \ --gguf_path /data/DeepSeek-V3-0324-GGUF-UD/UD-Q4_K_XL \ --optimize_config_path /home/jesse/ktransformers/ktransformers/ktransformers/optimize/optimize_rules/DeepSeek-V3-Chat.yaml \ --temperature 0.3 \ --cpu_infer 30 \ --cache_lens 131072 \...
> I am pretty sure that the problem is similar to what is going on with balance_serve and R1/V3 can fix it. But should we do that? The thing is...
> Lets see if the authors will merge the PR you made. If not, I am forking it lol and doing everything properly. I wish I knew enough about LLM...
`llama.cpp` is faster for me in tok/s than `ktransformers`, due to the way it handles NUMA. I can run it with NPS4 and tune the number of CPU threads to...
Yeah, so `client` might be a better name for the `proxy`?
Hey @KiruyaMomochi. I wrote the latest iteration of the DS 3.1 tool calling code. I wrote a ton of unit tests. Strongly recommend writing your own in the same style...
https://github.com/ggml-org/llama.cpp/pull/16932 sort of works, but in my testing with Open Hands it keeps stopping for some reason. I have to type "continue" constantly and it gets stuck in repetitive loops....