fast-llama
fast-llama copied to clipboard
Runs LLaMA with Extremely HIGH speed
what tokenizer to use for mistral? stories110M is working fine (a lot of nonsense text generated) but how to use mixtral gguf etc?
` ./main -f gguf -c ../text-generation-webui/models/beagle14-7b.Q5_K_M.gguf` ERROR: [./src/model_loaders/gguf_loader.cpp:263] [load_gguf()] Unsupported file type:17 Failed to load model
Llama 2 7b chat Q8 guff causes error unknown tokenid Commad ./main -c ./llama-2-7b-chat.Q8_o.gguf -j 40 -n 200-i "Advice " Error ERROR:[src/model_loaders/gguf_loader.cpp:320][load_gguf()]Unknown key:tokenizer.ggml.unknown_token_id Failed to load model
Where to download models ./models/cnllama-7b/ggml-model-f32.gguf?
Building the project fails due to the missing `` header. This should allow building the project. Closes #1
Hi, I've been trying to compile the RapidLLaMA, but it seems to have issues. Is the repo still incomplete? I also had to manually build `sleef` with the test units...
Benchmark fast-llama against llama.cpp and mistral.rs with different amount of cpu cores and different gpus.