Gaurav Garg comments

Repositories
Issues
Comments

Results 3 comments of


                                            Gaurav Garg

ggml: avoid rebuild of GGML graph for each token (#7456)

@gerganov @slaren @agray3 I'm interested in reducing the CPU overhead associated with building the GGML graph and would like to follow up on this PR. In particular, I'd like to...

ggml: avoid rebuild of GGML graph for each token (#7456)

Thanks @ggerganov for the quick response. This is exactly what I was proposing above: "A potential solution is to introduce a specialized copy operator for the KV cache that fuses...

ggml: avoid rebuild of GGML graph for each token (#7456)

Thanks, this makes sense. Do we need a specialized function to handle transposed v-cache or `ggml_set_rows` will be enough? Check this part of the code: https://github.com/ggml-org/llama.cpp/blob/7675c555a13c9f473249e59a54db35032ce8e0fc/src/llama-kv-cache-unified.cpp#L668-L673 Update: Never mind, I...