Piotr Wilkin (ilintar) comments

Results 77 comments of


                                            Piotr Wilkin (ilintar)

Eval bug: Granite 4 template detection fails

Chat format detection is done based on keywords in the template though, not on the matching fragments of the actual chat :)

Eval bug: Granite 4 template detection fails

Server logs chat format for messages by default, if I remember correctly. Unless you really need to detect it client-side.

Refactor: convert_hf_to_gguf.py

Okay, so I've actually gone over and verified the conversion results - refactored, applied some tips from here while doing that and most of all - tested that it actually...

Refactor: convert_hf_to_gguf.py

Conversion verification script: https://gist.github.com/pwilkin/1e488423e9f2549c0518179bb9f752d5

Refactor: convert_hf_to_gguf.py

@CISC have fun! :)

Refactor: convert_hf_to_gguf.py

Yeah, might've went out of draft too early :>

Eval bug: GLM-Z1-9B-0414

@matteoserva Yeah, the last problem is the killer. Must be some implementation-specific error though, because the Transformers versions runs quite well.

FWIW did perplexity calcs on 50 chunks of [calibration_data_v5_rc.txt](https://github.com/user-attachments/files/19742743/calibration_data_v5_rc.txt) (that I used for the imatrix) and they seem OK: F16: PPL = 29.9842 +/- 1.09088 Q8_0: PPL = 30.0564 +/-...

Eval bug: GLM-Z1-9B-0414

Can confirm @piDack 's PR fixes the issues, reuploading fixed quants now.

Eval bug: GLM-Z1-9B-0414

@piDack Run the model with llama-server and go to the /props endpoint, you get: ```{"error":{"code":500,"message":"vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 151552)","type":"server_error"}}```