Gary Mulder

Results 154 comments of Gary Mulder

Same here when trying to test caching.

@abetlen sorry - above is a different error to what I'm getting, as reported in #90

Are you still having the same issue with langchain?

Can we close this bug as it looks like a Python path issue?

- Most physical systems are hyperthreaded - Hyperthreading doesn't seem to improve performance due to the memory I/O bound nature of `llama.cpp` - Might be invalid for VMs

Closing as it is impossible to support every config of hardware.

@eiery > With a n_batch of 1024 and no limit llama.cpp works fine with 2k+ length prompts, though with OpenBLAS I don't see a performance improvement in prompt ingestion (still...

@eiery `./perplexity` is still reporting a batch size of 512: ``` $ git log | head -3 commit 7f15c5c477d9933689a9d1c40794483e350c2f19 Author: Georgi Gerganov Date: Fri Apr 28 21:32:52 2023 +0300 $...

@cmp-nct @slaren EDIT2: Even though I had the performance power governor enabled the PCIe buses were powersaving by switching to 2.5 GT/sec. All results below are with 8GT/sec lanes, across...

How about [llama-cpp-python, which offers a web server which aims to act as a drop-in replacement for the OpenAI API](https://github.com/abetlen/llama-cpp-python)?