Matvey Soloviev comments

Results 103 comments of


                                            Matvey Soloviev

Q4_1 acceleration

Requesting review; it seems that it wouldn't actually let me review and merge myself (...even though I possibly could have merged locally and pushed straight to master?)

Q4_1 acceleration

Thank you! @FNsi, is this the Q4_1 quantification you are testing? The filename says models/30B/ggml-model-q4_0.bin. How is the performance at 30B?

[WIP] Improve performance on x86

I was independently trying to do something similar on the Q4_1 code [here](https://github.com/ggerganov/llama.cpp/tree/q4_1_more_accel). I managed to squeeze out somewhere around 5% more performance by rearranging the SIMD math and avoiding...

Introduce structs for the q4 data blocks

@ggerganov Thanks for paging me in! As far as I can see, there would be no conflict with anything I'm doing. I think this is a good change in terms...

Don't force immediate interactive without `-i`

I think it's good to not force interactive mode immediately (in fact that was how it worked when I first made the patch, but the logic seems to have changed...

Don't force immediate interactive without `-i`

@tjohnman Thanks! Wasn't meaning to imply you had anything to do with the removal - development has been moving quickly and chaotically, it probably just fell on the wayside in...

Investigate alternative approach for Q4 quantization

I'm playing around with local search for the q4_1 parameters now, with something like the following approximately in place of the inner loop of `quantize_row_q4_1`: ```c round_block(pp, x + i*QK,...

Investigate alternative approach for Q4 quantization

I tried to run the previously mentioned Q4_1 quantization method with some number of local relaxation steps to reduce the square error (down to 83% of the naive computation's error...

feat: '--in-prefix STRING' option

@anzz1 The reverse prompt can span multiple tokens. However, there is no way for it to interrupt generation mid-token. (That's why I opted to use token vectors rather than strings...

through compiling it, the app won't run

What directory are you running it from, and what is the directory you moved the json file (which one?) to? Have you run `make install`? Notekit should be capable of...