David_

Results 3 comments of David_

> It's mostly working, but with > > * chat mode `--cai-chat` > > * `stop generating at new line character?` unset > > > there is a memory leak...

~~Update: Waiting about 20 seconds between runs seems to prevent/reduce the chance of the VRAM usage skyrocketing, perhaps it takes time to clear cache after generating, and running generation interrupts...

Tested the latest commit, seems that VRAM usage is no longer skyrocketing. The streaming is reasonably fast.