langflow icon indicating copy to clipboard operation
langflow copied to clipboard

Add support for llama.cpp

Open SumDevv opened this issue 1 year ago • 6 comments

add support for llama.ccp for local Ai inferencing

SumDevv avatar Apr 10 '23 04:04 SumDevv

#134 added that but we haven't released it yet because I was not able to test it yet. Do you think you could test it using the dev branch?

ogabrielluiz avatar Apr 10 '23 14:04 ogabrielluiz

Langflow stays stuck on ' thinking ' even after 5 minutes.. with the latest 0.56 build.. Also idk why it unsucessfully runs llama.cpp two times and gets stuck on the third time?

siddhesh@desktop:~/Desktop$ langflow [16:39:52] INFO [16:39:52] - INFO - Logger set up with log level: 20(info) logger.py:28 INFO [16:39:52] - INFO - Log file: logs/langflow.log logger.py:30 [2023-04-14 16:39:52 +0530] [12703] [INFO] Starting gunicorn 20.1.0 [2023-04-14 16:39:52 +0530] [12703] [INFO] Listening at: http://127.0.0.1:7860 (12703) [2023-04-14 16:39:52 +0530] [12703] [INFO] Using worker: uvicorn.workers.UvicornWorker [2023-04-14 16:39:52 +0530] [12715] [INFO] Booting worker with pid: 12715 [2023-04-14 16:39:52 +0530] [12715] [INFO] Started server process [12715] [2023-04-14 16:39:52 +0530] [12715] [INFO] Waiting for application startup. [2023-04-14 16:39:52 +0530] [12715] [INFO] Application startup complete. llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | llama_model_load: loading model from '/home/siddhesh/Desktop/vicuna.bin' - please wait ... llama_model_load: n_vocab = 32001 llama_model_load: n_ctx = 512 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 7759.84 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 9807.93 MB (+ 3216.00 MB per state) llama_model_load: loading tensors from '/home/siddhesh/Desktop/vicuna.bin' llama_model_load: model size = 7759.40 MB / num tensors = 363 llama_init_from_file: kv self size = 800.00 MB AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |

lolxdmainkaisemaanlu avatar Apr 14 '23 11:04 lolxdmainkaisemaanlu

Mine behaves the same way, but it is not stuck, it just takes that long for it to execute for me.

nsvrana avatar Apr 14 '23 15:04 nsvrana

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar May 29 '23 19:05 stale[bot]

Which model do I use for the llamaCPP LLM? I have tried several. Where is the documentation for using langflow?

TaoAthe avatar May 30 '23 01:05 TaoAthe

#233 Could you try what I mentioned in this issue? It works here. We've released a new version that might help with this.

ogabrielluiz avatar May 30 '23 21:05 ogabrielluiz

Sorry I have been off trying to locate a GPU that is better. I will try the latest thank you for responding.

TaoAthe avatar Jun 02 '23 23:06 TaoAthe

Does someone figured out how to run llama with langflow? I tried many approaches and I still am struggling, I have a model of llama-2-13b that I converted, build and quantized with llama.cpp. It's running well in llama (ggml-model-q4_0.gguf also tried ggml-vic7b-q4_0.bin). I created a models directory in root project and tried LlamaCpp and Ctransformers but I never got an response from the LLM.. Can someone please help me ?

berradakamal avatar Sep 15 '23 10:09 berradakamal