Lukas Kreussel
Lukas Kreussel
@cometyang This is a real headscratcher for me. I'm guessing `llama.cpp` still works fine on your machine? I'm probably gonna downgrade my cuda version to 11.8 in my wsl instance...
Hm that's a tricky one. Most likely the hyperparameters of the model diverge from the format defined by `llama.cpp` but `llm` should be able to handle that. Could you split...
After taking a short look at the hyperparameters they look valid. This file was probably created by an older version of ggml, where they didn't adjust the tensor metadata size...
Could we also include some optional generation parameters. Which contain default values for some sampling parameters? Or would that be to specific?
Im pretty sure that either your model isn't a fully compatible GPT-J model or there are differenzes in the tokenizer. Have you tried to load you GPT-J converted model with...
If you want you could create a PR containing these changes. The only thing i don't like about it is that the `ModelParameters` will then contain the `n_gqa` parameter which...
In my oppinion we should just hack it in for now, GGUF seams to be nearly ready meaning when we implement it we can cleanup the implementation. The most important...
Currently only `llama` based models are accelerated by metal/cuda/opencl. If you use another architecture like `gpt-neox` it will fallback to cpu only inference. What you are seeing in your std-out...
Alright i cant send you the Dockerfile, but i created a toy-example with your own server. Dockerfile: ``` FROM python:3.10 #install RUN pip3 install llama-cpp-python[server] #Expose the ports EXPOSE 8000...
Thats the strange thing the dockerfile listed above works without any problems. But if im trying to run my Dockerfile im getting the "GLIBC" error. This is my Dockerfile: ```...