jmtatsch
jmtatsch
It's extremely slow on my M1 MacBook (unusable), quite usable on my 4 yr old i7 workstation. And doesn't work at all on the same workstation inside docker. Found #767,...
There is a vicuña model rev1 with some kind of stop fix on 🤗 . Maybe that solves your issue?
I run within a Ubuntu container which works. https://github.com/mkellerman/gpt4all-ui/ runs a 3.11 container and it works so I would guess the issue is not with llama-cpp-python but with your concrete...
Took the opportunity to shrink my own dockerfile: ``` FROM python:3.10 COPY .devops/requirements.txt requirements.txt RUN pip install -r requirements.txt && rm -rf requirements.txt ENTRYPOINT [ "python3", "-m", "llama_cpp.server" ] ```...
Seems like loading the model already fails. Double check your model path.
Very funny indeed. My vicuna prefers to answer me in Chinese. With this fix at least it can without erroring out.
I agree the "dummy" caching feature is already really useful. Makes all the difference between me wanting to use it or rather going to openai ;) Regarding a real caching...
> I'm thinking of trying to get it to work with my videocard, since it is the most high end part of my pc, but am not quite sure yet...
Maybe I didn't have the patience to really wait it out. However it wastes a ton of cpu cycles / energy.
@dibrale Thanks, that solves the issue for me. However there are a couple of changes that don't seem necessary for me. Maybe you can explain a bit why you chose...