ggml Update replit inference code to match reference

Fixes the replit example to work with the new version of the model (tested with 1e1a20a5764458844056ca44c25617c91840d8c9 Sun May 28 04:43:38 2023)

May 30 '23 17:05 lukasmoellerch

Model file uploaded here: https://huggingface.co/lukasmoeller/replit-code-v1-3b-ggml, will upload quantised version later

May 30 '23 17:05 lukasmoellerch

Didn't saw https://github.com/ggerganov/ggml/pull/206, mb

May 30 '23 17:05 lukasmoellerch

This actually changes a bit more than that PR; feel free to close though

May 30 '23 17:05 lukasmoellerch

This actually changes a bit more than that PR; feel free to close though

I will close #206 .

May 30 '23 18:05 klosax

I'm getting a SEGFAULT when running this (quantized -f16) on an 800+ token input (example). I pulled the latest Replit model beforehand. My setup is an M1 Max 64G. Exact message:

command: ./build/bin/replit -m replit-code-v1-3b/ggml-model-f16.bin -f prompt.txt --top_k 0 --top_p 0.95 --temp 0.2 -n 1500
error message: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268799024, available 268435456)

May 31 '23 00:05 darianss