fast-llama icon indicating copy to clipboard operation
fast-llama copied to clipboard

Runs LLaMA with Extremely HIGH speed

Results 7 fast-llama issues
Sort by recently updated
recently updated
newest added

what tokenizer to use for mistral? stories110M is working fine (a lot of nonsense text generated) but how to use mixtral gguf etc?

` ./main -f gguf -c ../text-generation-webui/models/beagle14-7b.Q5_K_M.gguf` ERROR: [./src/model_loaders/gguf_loader.cpp:263] [load_gguf()] Unsupported file type:17 Failed to load model

Llama 2 7b chat Q8 guff causes error unknown tokenid Commad ./main -c ./llama-2-7b-chat.Q8_o.gguf -j 40 -n 200-i "Advice " Error ERROR:[src/model_loaders/gguf_loader.cpp:320][load_gguf()]Unknown key:tokenizer.ggml.unknown_token_id Failed to load model

Where to download models ./models/cnllama-7b/ggml-model-f32.gguf?

Building the project fails due to the missing `` header. This should allow building the project. Closes #1

Hi, I've been trying to compile the RapidLLaMA, but it seems to have issues. Is the repo still incomplete? I also had to manually build `sleef` with the test units...

Benchmark fast-llama against llama.cpp and mistral.rs with different amount of cpu cores and different gpus.