CHesketh76

Results 18 comments of CHesketh76

On Linux it happens 100% of the time.

Add-on: This is not just llama.cpp and GGUF files, but also GTPQ files with Transformers. Most of the vRam is able to be freed up when unloading using transformers, but...

I could not find a solution to this issue with Ubuntu, but after running this on Fedora 39 the model weights do unload from the RAM and VRAM unload as...

Oh so this I can just remove that if I am not using a multi-model? Also, could I combine the llava with my mistral model to create a multi-model?

Was this resolved? I am also getting the same error when running this: ``` python -m vllm.entrypoints.openai.api_server --model TheBloke/phi-2-GPTQ --quantization gptq --trust-remote-code ```

Nope, not using OpenLLM and switching to vLLM was my only fix

> Should work already, I've gotten it working with: > > ```python > from ctransformers import AutoModelForCausalLM > > # Set gpu_layers to the number of layers to offload to...

Issue can be created with this code from the readme ``` from ctransformers import AutoModelForCausalLM llm = AutoModelForCausalLM.from_pretrained("/path/to/ggml-model.bin", model_type="gpt2", hf=True) print(llm("AI is going to")) ``` or in https://colab.research.google.com/drive/1GMhYMUAv_TyZkpfvUI1NirM8-9mCXQyL. Hope this...

I spent all day trying to get Mistral working with ctranformers, but it is returning garbage text on my end. I believe it may be the tokenizer because ```tokenizer =...

Are you able to use the ```model.generate(...)``` I have got everything to run until I start generating text, it will just run indefinitely.