generic-username0718
generic-username0718
I'm running llama 65b on dual 3090s and at longer contexts I'm noticing seriously long context load times (the time between sending a prompt and tokens actually being received/streamed). It...
### Describe the bug Load LoRA on desktop LoRA says None on Phone Try changing LoRA to alpaca on phone Reloads llama completely? Still says Lora = None on phone......
### Describe the bug Generation attempts clear the chat response ### Is there an existing issue for this? - [X] I have searched the existing issues ### Reproduction python3 server.py...
New Model out. Any chance it'll be supported by you guys?