Alberto Cabrera Pérez
Alberto Cabrera Pérez
@ggerganov I've addressed all your comments. Let me know if something else is required.
Mmm. Let's revert this then. I will reopen a PR with the branch as a draft and we can have a better solution. I'd rather not introduce a regression upstream....
@ggerganov is there something else needed from my side or are we waiting another review?
I was able to replicate the PPL skyrocketing with the generic implementation as well: ``` # ggml_gemm_q4_K_8x8_q8_K_generic perplexity: 34.48 seconds per pass - ETA 1.43 minutes [1]9.6770,[2]1762.7802,[3]9505.4348,[4]22802.6452,[5]5311.2750,[6]10333.9703,[7]16582.8044,[8]23315.3388,[9]11093.7993,[10]14942.7293, # ggml_gemm_q4_K_8x8_q8_K perplexity:...
I've opened #17030 for the fix. > Hm yes - `Q4_0` with LFM is indeed also problematic. However `Q4_0` with llama 3.1 8B is good. So this means there is...
@ggerganov https://github.com/ggml-org/llama.cpp/pull/17241 fixed the perplexity issues. So this PR is again ready for review (It's rebased on top of master).
@ggerganov sorry for pinging again! I don't have merge rights. Could you please?
Ah, Sorry for the misunderstanding! I got another merged in with a single review and didn't realize both approvals were needed. Thanks!
LGTM, great work @joeatodd
@lslusarczyk Mind rebasing on top of master? I'm seeing very bad performance in this PR, but it seems to be related to not having #13343 in this branch. A previous...