Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

Regressions on IQ3_XXS over time

I have added the bug tag that will prevent the bot from closing the issue. Pointing at the specific PRs that introduced a regression would improve the chances of this...

ggml_new_object: not enough space in the context's memory pool (needed 3539648, available 3539280)

You may be able to get it to run by increasing `LLAMA_MAX_NODES` in `llama.cpp`.

ggml_new_object: not enough space in the context's memory pool (needed 3539648, available 3539280)

You would also need to increase `GGML_SCHED_MAX_SPLITS` then. But using a build without GPU acceleration would also work.

ggml_new_object: not enough space in the context's memory pool (needed 3539648, available 3539280)

The issue with 2 expert models should be fixed in #6735.

Llama.cpp GPU Offloading Issue - Unexpected Switch to CPU

> If you don't know how many layers there are, you can use -1 to move all to GPU. That's not the case in the llama.cpp C API.

[CANN] Add Ascend NPU backend

> I'd like to know what standards should be met before merge this PR? Can this PR be merged first and then continue to fix the above problems? I can...

[CANN] Add Ascend NPU backend

You can also make an operation run on the CPU by returning `false` from `supports_op`.

server: stop generation at `n_ctx_train` if `n_predict` is not set

There should still be some limit to avoid getting into an infinite loop in the server.

server: stop generation at `n_ctx_train` if `n_predict` is not set

When this happens, the response of `/completion` has these fields: ```json "truncated": true, "stopped_eos": false, "stopped_word": false, "stopped_limit": false, ``` I am not familiar with the meaning of each of...

server: stop generation at `n_ctx_train` if `n_predict` is not set

Maybe it would be simpler to set `n_predict` to `n_ctx_train` by default if not set in the request.