Diego Devesa comments

Results 361 comments of


                                            Diego Devesa

Significantly different results (and WRONG) inference when GPU is enabled.

It's not likely to be an incompatibility with the GPU architecture, in fact the ggml-ci tests every commit on master on a PCIE V100. Whatever the issue is, it seems...

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC

Could you add some documentation about how to use the `CMakePresets.json` file? A comment in the PR description is enough. If I understand correctly, this is not being used in...

Add support for properly optimized Windows ARM64 builds with LLVM and MSVC

I am not sure that we need to make changes to accommodate what seems to be a buggy or misconfigured VS Code extension. FWITW I use VS Code, but not...

Kompute-based Vulkan backend shows an GGML_OP_GET_ROWS error

It seems that the Kompute backend is missing f32 get_rows. It already supports f16, so hopefully it will be easy to add f32.

ggml llama: align structs for memory optimization on 64-bit platforms

The changes to the llama.h public structs are effectively a API breaking change for no real benefit. The other structs are less sensitive since they are internal to ggml, but...

Better ccache guide

I am not sure that this change is necessary, or the color code stuff.

Better ccache guide

I don't think that people working enough on the llama.cpp code to benefit from ccache need to be reminded to run `make clean` before disabling it, and the link to...

common: free ctx_gguf when exiting llama_control_vector_load_one

There are more leaks in this function. #6289 has the fixes.

Does it make sense to optimize strlen in this function with for loops?

We do not want to apply optimizations that serve no real purpose but make the code harder to read. And anyway, the compiler is perfectly capable of optimizing a `strlen(s)...

ggml-qnn: add Qualcomm QNN(Qualcomm Neural Network,aka Qualcomm AI Engine Direct) backend

You would need to modify `ggml_backend_registry_init` to register the backend, then it should be automatically used by `test-backend-ops`. https://github.com/ggerganov/llama.cpp/blob/54770413c484660d021dd51b5dbacab7880b8827/ggml-backend.c#L411