ctransformers
ctransformers copied to clipboard
Python bindings for the Transformer models implemented in C/C++ using GGML library.
Does ctransformers support ollama models? How do I specify the model in this code below? llm = CTransformers(model="***where is the model file for a ollama model?", model_type="llama", max_new_tokens=512, temperature=0.1)
It would nice to have active support for Google's Gemma models.
I have converted my finetuned hugging face model to .gguf format and triggered the inference with ctransformers. I am using a CUDA GPU machine. But i did not observe any...
I have finetuned the mistral base model with my data using LORA PEFT Base model tried: **mistralai/Mistral-7B-Instruct-v0.2** Finetuned merged model folder structure:  All model files are...
Hey, I wanted to test the Mistral-7b model and tried to run the following code: from ctransformers import AutoModelForCausalLM, AutoConfig, Config conf = AutoConfig(Config(temperature=0.8, repetition_penalty=1.1, batch_size=52, max_new_tokens=1024, context_length=2048)) llm =...
When I ran the model, I found `context_length` has no effect. I try to fix it same as https://github.com/marella/ctransformers/blob/ed02cf4b9322435972ff3566fd4832806338ca3d/ctransformers/gptq/llm.py#L213
I am using the Linux system with GPU and installed ctransformers-0.2.14 using pip. It installed all fine. But now when I try to run the GGML model quantized by @TheBloke...
Hello, I'm trying to use ctransformers as below: ``` from ctransformers import AutoModelForCausalLM # Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no...
A new set of 7b foundational models that claim to beat all 13b Llama 2 models in benchmarks. https://huggingface.co/mistralai/Mistral-7B-v0.1 https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.1 Is this easy to implement?
I load the model to GPU like this: ```python llm = AutoModelForCausalLM.from_pretrained("LLM-model", model_file="vinallama-7b-chat_q5_0.gguf", config=config, torch_dtype=torch.float16, hf=True, gpu_layers = 100, device_map='cuda') ``` and generate code like this: ```python generated_ids = llm.generate(**model_inputs,...