Piotr Wilkin (ilintar)
Piotr Wilkin (ilintar)
Chat format detection is done based on keywords in the template though, not on the matching fragments of the actual chat :)
Server logs chat format for messages by default, if I remember correctly. Unless you really need to detect it client-side.
Okay, so I've actually gone over and verified the conversion results - refactored, applied some tips from here while doing that and most of all - tested that it actually...
Conversion verification script: https://gist.github.com/pwilkin/1e488423e9f2549c0518179bb9f752d5
@CISC have fun! :)
Yeah, might've went out of draft too early :>
@matteoserva Yeah, the last problem is the killer. Must be some implementation-specific error though, because the Transformers versions runs quite well.
FWIW did perplexity calcs on 50 chunks of [calibration_data_v5_rc.txt](https://github.com/user-attachments/files/19742743/calibration_data_v5_rc.txt) (that I used for the imatrix) and they seem OK: F16: PPL = 29.9842 +/- 1.09088 Q8_0: PPL = 30.0564 +/-...
Can confirm @piDack 's PR fixes the issues, reuploading fixed quants now.
@piDack Run the model with llama-server and go to the /props endpoint, you get: ```{"error":{"code":500,"message":"vector::_M_range_check: __n (which is 18446744073709551615) >= this->size() (which is 151552)","type":"server_error"}}```