Update replit inference code to match reference
Fixes the replit example to work with the new version of the model (tested with 1e1a20a5764458844056ca44c25617c91840d8c9 Sun May 28 04:43:38 2023)
Model file uploaded here: https://huggingface.co/lukasmoeller/replit-code-v1-3b-ggml, will upload quantised version later
Didn't saw https://github.com/ggerganov/ggml/pull/206, mb
This actually changes a bit more than that PR; feel free to close though
This actually changes a bit more than that PR; feel free to close though
I will close #206 .
I'm getting a SEGFAULT when running this (quantized -f16) on an 800+ token input (example). I pulled the latest Replit model beforehand. My setup is an M1 Max 64G. Exact message:
command: ./build/bin/replit -m replit-code-v1-3b/ggml-model-f16.bin -f prompt.txt --top_k 0 --top_p 0.95 --temp 0.2 -n 1500
error message: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268799024, available 268435456)