Ravindra Marella comments

Results 63 comments of


                                            Ravindra Marella

How to use wizard coder

Which file did you download from [here](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML/tree/main)? Did you use the same file in Google Colab as well? Are you running it in WSL on your machine? Have you tried...

How to use wizard coder

Can you please try running ctransformers on Windows and see if it works. Are you running Linux on WSL? WSL has less memory allocated compared to Windows. I'm guessing on...

jupyter notebook crashed when importing llama2 models

Hi, it will be hard to debug the error in notebook. Can you try running the same code in a normal Python script and share the output.

Integration with Llama 2

Which model (7B, 13B etc.) and quantization format (Q2, Q4_0, etc.) are you using? If you have enough RAM, you can try using larger models.

Integration with Llama 2

@jamesljl do you have a link to download model? Also please share the code along with the params you are using.

Integration with Llama 2

No, ctransformers doesn't use any prompt template as it depends on the specific model you are using. You should include it in the prompt.

FileNotFoundError: Could not find module '...ctransformers\lib\cuda\ctransformers.dll' (or one of its dependencies).

Please run the following command and post the output: ```sh pip show ctransformers nvidia-cuda-runtime-cu12 nvidia-cublas-cu12 ``` Make sure you have installed the CUDA libraries using: ```sh pip install ctransformers[cuda] ```

REST API?

Nice @matthoffner --- @Spiritdude I have been thinking of adding an OpenAI-compatible API server but not getting the time to do it. For now if you want to create your...

falcon.cpp: tensor 'lm_head.weight' is missing from model

I don't think it is correct to change it. Which scripts are you using for converting? Can you please try running the model using https://github.com/cmp-nct/ggllm.cpp and see if throws the...

May not even be a Transformers issue.. WizardLM-Uncensored-Falcon-40

`llm(...)` doesn't return until the entire text is generated whereas `llm.generate(...)` sends tokens one-by-one as they get generated. Is it exiting without error and without printing anything? Try using `stream=True`:...