Daniel
Daniel
But q4_1 works well.
get a quntized model from this model: multi-qa-MiniLM-L6-cos-v1 on hugging face.
I modify the code to adapt to BertCode with the latest ggml, it works fine. Maybe it can be solve by upgrading GGML?
> nan results are typically a sign of some float accuracy weirdness. Do you have a very small model? I think the quantization is less accurate the smaller your model...
I have pulled a request and the repo owner has merged it. Git pull to get a new version, it works on Windows.
Are working on the codebert? my email: [email protected]
or BERT mode, its overhead is calculated as : model_mem_req += (5 + 16 * n_layer) * 256; // object overhead Can anyone explain the meaning 5 is extra tensors,...
thanks for your answer:)
I have tested the latest ggml, should alter the 256 to 512. Do not understand why:(
mobilenetV3 转换后的输出也是错的