Gary Mulder comments

Results 154 comments of


                                            Gary Mulder

import llama_cpp ERROR

Same here when trying to test caching.

import llama_cpp ERROR

@abetlen sorry - above is a different error to what I'm getting, as reported in #90

import llama_cpp ERROR

Are you still having the same issue with langchain?

import llama_cpp ERROR

Can we close this bug as it looks like a Python path issue?

Default value for the number of threads

- Most physical systems are hyperthreaded - Hyperthreading doesn't seem to improve performance due to the memory I/O bound nature of `llama.cpp` - Might be invalid for VMs

Default value for the number of threads

Closing as it is impossible to support every config of hardware.

Have n_batch default to 512 when BLAS is enabled

@eiery > With a n_batch of 1024 and no limit llama.cpp works fine with 2k+ length prompts, though with OpenBLAS I don't see a performance improvement in prompt ingestion (still...

Have n_batch default to 512 when BLAS is enabled

@eiery `./perplexity` is still reporting a batch size of 512: ``` $ git log | head -3 commit 7f15c5c477d9933689a9d1c40794483e350c2f19 Author: Georgi Gerganov Date: Fri Apr 28 21:32:52 2023 +0300 $...

Performance issues with cuBLAS and a bug

@cmp-nct @slaren EDIT2: Even though I had the performance power governor enabled the PCIe buses were powersaving by switching to 2.5 GT/sec. All results below are with 8GT/sec lanes, across...

Is there a way to use this without OpenAI?

How about [llama-cpp-python, which offers a web server which aims to act as a drop-in replacement for the OpenAI API](https://github.com/abetlen/llama-cpp-python)?