Abed comments

Results 19 comments of


                                            Abed

Vicuna (Fine-tuned LLaMa)

I managed to run Vicuna 13b using LLP API and used it in Langchain: I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)...

Vicuna (Fine-tuned LLaMa)

Did you try [llm-api](https://github.com/1b5d/llm-api) for CPU inference? you can simply just run a docker container and expose the model thought a simple API. You can then use [langchain-llm-api](https://github.com/1b5d/langchain-llm-api) to add...

Vicuna (Fine-tuned LLaMa)

> > I managed to run Vicuna 13b using LLP API and used it in Langchain: > > I've written an app to run llama based models using docker here:...

Vicuna (Fine-tuned LLaMa)

Follow up on the comments above: I've recently updated [llm-api](https://github.com/1b5d/llm-api) to be able to run Llama.cpp, GPTQ for Llama or a generic huggingface pipeline. You can easily switch between CPU...

Alpaca (Fine-tuned LLaMA)

I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) and [llama-cpp](https://github.com/ggerganov/llama.cpp) You can specify the model in the config file, and the app...

Alpaca (Fine-tuned LLaMA)

There are use case for both local and remote model inference I believe, I want to run my models on a remote server, while others might have enough hardware power...

LlaMa

Illegal instruction (core dumped)

Could you please share the configs you are using for this model?

Illegal instruction (core dumped)

Btw I just built different images for different BLAS backends: - OpenBLAS: 1b5d/llm-api:latest-openblas - cuBLAS: 1b5d/llm-api:latest-cublas - CLBlast: 1b5d/llm-api:latest-clblast - hipBLAS: 1b5d/llm-api:latest-hipblas Could you please let me know if that...

any advice for getting this running with `gpt-x-alpaca` models?

Hey there! thanks for the feedback, the current implementation can only run models in the [ggml](https://github.com/ggerganov/ggml) format to be able to do inference on CPUs using the llama.cpp lib, but...