Piotr Wilkin (ilintar)

Results 77 comments of Piotr Wilkin (ilintar)

Chat format detection is done based on keywords in the template though, not on the matching fragments of the actual chat :)

Server logs chat format for messages by default, if I remember correctly. Unless you really need to detect it client-side.

Okay, so I've actually gone over and verified the conversion results - refactored, applied some tips from here while doing that and most of all - tested that it actually...

Conversion verification script: https://gist.github.com/pwilkin/1e488423e9f2549c0518179bb9f752d5

Yeah, might've went out of draft too early :>

@matteoserva Yeah, the last problem is the killer. Must be some implementation-specific error though, because the Transformers versions runs quite well.

FWIW did perplexity calcs on 50 chunks of [calibration_data_v5_rc.txt](https://github.com/user-attachments/files/19742743/calibration_data_v5_rc.txt) (that I used for the imatrix) and they seem OK: F16: PPL = 29.9842 +/- 1.09088 Q8_0: PPL = 30.0564 +/-...

Can confirm @piDack 's PR fixes the issues, reuploading fixed quants now.

@piDack Run the model with llama-server and go to the /props endpoint, you get: ```{"error":{"code":500,"message":"vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 151552)","type":"server_error"}}```