Pierrick Hymbert comments

Results 83 comments of


                                            Pierrick Hymbert

Server: enable lookup decoding

I was not aware, but this is not asserted in the parallel test suite AFAIK. Also, I recall that each architecture generates different results.

common : fix parallel shard download interleaving output

@TevinWang do you need help on the global variable issue ? @ggerganov do you confirm the issue with the proposed approach? IMHO this contribution is valuable as the console output...

ggml : add RPC backend

@rgerganov Nice to meet you :D

GGUF quantization meta-data format

Hi, you would better have a look at llama.cpp : https://github.com/ggerganov/llama.cpp/blob/f184dd920852d6d372b754f871ee06cfe6f977ad/llama.cpp#L13599

server: fix api CORS preflight error

Please submit a PR :)

server: bench: continuous performance testing

@ggerganov @ngxson @slaren appreciate your early feedback on the approach before I start implementing too much

server: bench: continuous performance testing

> Servers with T4 GPU are usually "shared CPU but dedicated GPU". I believe that's also the case with other GPU like A100 or A10G, but not sure if it's...

server: bench: continuous performance testing

@ggerganov We need to keep this in mind: >Warning: We recommend that you only use self-hosted runners with private repositories. This is because forks of your public repository can potentially...

server: bench: continuous performance testing

@ggerganov what about the defragmentation target for the [baseline](https://github.com/ggerganov/llama.cpp/pull/6283/files#diff-a5e740be96415373789689f814583e93ff2a8f05eae6481e94505fd6cb6bc6a7), without, I see lot of: `update_slots : failed to find free space in the KV cache, retrying with smaller n_batch =...

server: bench: continuous performance testing

First workflow ready to receive feedback: - comment added automatically: https://github.com/phymbert/llama.cpp/pull/1#issuecomment-2018674798 - workflow which run on the Azure T4 self-hosted github runner: https://github.com/phymbert/llama.cpp/actions/runs/8428687623/job/23082119753 - code: https://github.com/ggerganov/llama.cpp/pull/6283 Based on this, we...