Peji-moghimi
Peji-moghimi
@dvmazur @lavawolfiee Can you please kindly address this question? I'd be happy to do this myself if it's not already possible, which I don't think it is, if you could...
Hi @dvmazur! Thank you for your reply. Unfortunately (or fortunately) I have 8 1080ti GPUs on my machine, which individually cannot seem to handle the model even with quantization and...
> > May I ask which quantization setup allowed compression down to 17Gb, or if you could point me to a file that contains that setup please? > > It's...
> > the model seems to only occupy ~11Gb on a single GPU without an OOM error, but then at inference there's no utilization of the GPU cores throughout (though...
I also have the same problem, except even running the `promptify_NER.ipynb` example notebook! For the sake of ease, here is the code snippet: ``` from promptify import Prompter,OpenAI, Pipeline model...
Weirdly it turns out, if the input is wrapped in triple quotes it runs just fine and very short span of time.
I also need to use this with LLama-Cpp python.