dogjamboree
dogjamboree
The same is true in MacOS Ventura, or at least in the tray. It seems to work elsewhere. Edit: I noticed that once I resize the window down to where...
I can confirm this behavior (was about to open a ticket). M2 Pro, 32gb ram. Tried adding "Custom stopping strings" but honestly I'm not sure how that's supposed to work...
> use vicuna v1.1 Thanks for the suggestion but same result🤔 It happens in llama.cpp but not nearly as badly. At least I'm able to ^c to interrupt...
Thanks for taking the time to write those detailed instructions. I'll give it a shot!
Awesome -- I was having trouble figuring out how to attach here, haha.
Yes, I too have an M2 Pro (32gb ram) and it works but just very very slowly. Even 30b loaded but I mostly tried 13b.
If I understand correctly, repeat_last_n works by avoiding generating any of the tokens (words) in the last n, so if you set it as high as something like 2048, the...
I use either llama.cpp or oobabooga with versions of the model freely available on HuggingFace.
Llama.cpp supports it... I mean it responds with answers that tell me it understood my request such as saying " I'm sorry for my previous request, here's what you requested......
Are there any future plans to address this issue? Or does it even seem fixable? I just bought an m2 ultra with 128gb ram hoping it would be a great...