Abed

Results 19 comments of Abed

I managed to run Vicuna 13b using LLP API and used it in Langchain: I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to [llama-cpp-python](https://github.com/abetlen/llama-cpp-python)...

Did you try [llm-api](https://github.com/1b5d/llm-api) for CPU inference? you can simply just run a docker container and expose the model thought a simple API. You can then use [langchain-llm-api](https://github.com/1b5d/langchain-llm-api) to add...

> > I managed to run Vicuna 13b using LLP API and used it in Langchain: > > I've written an app to run llama based models using docker here:...

Follow up on the comments above: I've recently updated [llm-api](https://github.com/1b5d/llm-api) to be able to run Llama.cpp, GPTQ for Llama or a generic huggingface pipeline. You can easily switch between CPU...

I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) and [llama-cpp](https://github.com/ggerganov/llama.cpp) You can specify the model in the config file, and the app...

There are use case for both local and remote model inference I believe, I want to run my models on a remote server, while others might have enough hardware power...

I've written an app to run llama based models using docker here: https://github.com/1b5d/llm-api thanks to [llama-cpp-python](https://github.com/abetlen/llama-cpp-python) and [llama-cpp](https://github.com/ggerganov/llama.cpp) You can specify the model in the config file, and the app...

Could you please share the configs you are using for this model?

Btw I just built different images for different BLAS backends: - OpenBLAS: 1b5d/llm-api:latest-openblas - cuBLAS: 1b5d/llm-api:latest-cublas - CLBlast: 1b5d/llm-api:latest-clblast - hipBLAS: 1b5d/llm-api:latest-hipblas Could you please let me know if that...

Hey there! thanks for the feedback, the current implementation can only run models in the [ggml](https://github.com/ggerganov/ggml) format to be able to do inference on CPUs using the llama.cpp lib, but...