fast-llama
fast-llama copied to clipboard

→

Metadata

Runs LLaMA with Extremely HIGH speed

Reame
Issues

Results 7 fast-llama issues

Sort by recently updated

any example of using any gguf mistral example?

what tokenizer to use for mistral? stories110M is working fine (a lot of nonsense text generated) but how to use mixtral gguf etc?

Failed to load model

1

comment

` ./main -f gguf -c ../text-generation-webui/models/beagle14-7b.Q5_K_M.gguf` ERROR: [./src/model_loaders/gguf_loader.cpp:263] [load_gguf()] Unsupported file type:17 Failed to load model

Llama 2 7b chat Q8 guff causes error unknown tokenid

1

comment

Llama 2 7b chat Q8 guff causes error unknown tokenid Commad ./main -c ./llama-2-7b-chat.Q8_o.gguf -j 40 -n 200-i "Advice " Error ERROR:[src/model_loaders/gguf_loader.cpp:320][load_gguf()]Unknown key:tokenizer.ggml.unknown_token_id Failed to load model

How to get the model

Where to download models ./models/cnllama-7b/ggml-model-f32.gguf？

fix: missing include in utility.h

Building the project fails due to the missing `` header. This should allow building the project. Closes #1

Cannot build

1

comment

Hi, I've been trying to compile the RapidLLaMA, but it seems to have issues. Is the repo still incomplete? I also had to manually build `sleef` with the test units...

Better and updated benchmarks

Benchmark fast-llama against llama.cpp and mistral.rs with different amount of cpu cores and different gpus.

About

Runs LLaMA with Extremely HIGH speed

inference-engine

llama

llama2

cpu-inference

86

Stars

8

Forks

Watchers

Owner

← Metadata

86

Stars

8

Forks

Watchers

Owner

Metadata

Runs LLaMA with Extremely HIGH speed