Ravindra Marella

Results 63 comments of Ravindra Marella

Which file did you download from [here](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML/tree/main)? Did you use the same file in Google Colab as well? Are you running it in WSL on your machine? Have you tried...

Can you please try running ctransformers on Windows and see if it works. Are you running Linux on WSL? WSL has less memory allocated compared to Windows. I'm guessing on...

Hi, it will be hard to debug the error in notebook. Can you try running the same code in a normal Python script and share the output.

Which model (7B, 13B etc.) and quantization format (Q2, Q4_0, etc.) are you using? If you have enough RAM, you can try using larger models.

@jamesljl do you have a link to download model? Also please share the code along with the params you are using.

No, ctransformers doesn't use any prompt template as it depends on the specific model you are using. You should include it in the prompt.

Please run the following command and post the output: ```sh pip show ctransformers nvidia-cuda-runtime-cu12 nvidia-cublas-cu12 ``` Make sure you have installed the CUDA libraries using: ```sh pip install ctransformers[cuda] ```

Nice @matthoffner --- @Spiritdude I have been thinking of adding an OpenAI-compatible API server but not getting the time to do it. For now if you want to create your...

I don't think it is correct to change it. Which scripts are you using for converting? Can you please try running the model using https://github.com/cmp-nct/ggllm.cpp and see if throws the...

`llm(...)` doesn't return until the entire text is generated whereas `llm.generate(...)` sends tokens one-by-one as they get generated. Is it exiting without error and without printing anything? Try using `stream=True`:...