DanielusG
DanielusG
Is there any update about this?
After many attempts I could not get the role chat to work I've use this code: ``` import re import guidance # define the model we will use settings =...
I manually modified the code (I haven't forked or committed yet) and managed to make it work with llama.cpp, I used the gpt4all 13b model. It lends itself well to...
> How does adding n_gpu_layers and use_mlock helps in performance? > > > llm = LlamaCpp(model_path=model_path, n_ctx=model_n_ctx, callbacks=callbacks,verbose=False,n_gpu_layers=n_gpu_layers, use_mlock=use_mlock,top_p=0.9, n_batch=1024) if the user have a Nvidia GPU, part of the...
> Could you elaborate on the "issue that made the prompt evaluation slow" that you fixed? I had also opened an issue about it #493 Basically, after a while of...
@imartinez as I understand that many people may not have sufficient computing power to run this code, if you want I can create a new branch in my fork and...
> for reference, after running the requirements, I still had to install the following (on clean environment): > > * python -m pip install python-dotenv > * pip install tqdm...
> If I am not mistaken 87% increase in speed. It moved from 24 seconds to 3 Seconds Yes, you are right, but expressed like this doesn't quite give the...
> Thanks for the detailed info @DanielusG! I'll be running some more tests before merging, feel free to keep it as a branch on your repo and evolve it further,...
> where the previous result had: > > llama_model_load_internal: [cublas] offloading 12 layers to GPU llama_model_load_internal: [cublas] total VRAM used: 2722 MB > > is it GPU dependent, the one...