AlpinDale
AlpinDale
Sorry for the late response, @davideuler Please set `--enforce-eager` as well to save memory, since it seems to not play very well with GGUF models. If you're still experiencing issues,...
Does it display another error when you kill the server with Ctrl + C?
The most likely cause for that error is a COOM error, so you may need to lower your number of threads.
Added in v0.6.0.
Thanks for the request. As we discussed in private, mirostat support is on the way. After that, we'll focus on a few more deterministic samplers then we can finally move...
At the moment, we've added support for: - mirostat - exllamav2 (though not the variable bitrates, GPTQ only) CFG support is not planned and will likely not happen in the...
Re-opening this issue so we can keep track of CFG support. After discussing internally, we decided to add it (but not as a high priority addition). We'll likely need to...
I noticed there was a PR for vLLM which streamlined the quantization stuff a lot better. I'll probably update this PR to follow that.
Thanks for reminding me @Vaibhavs10 ! I'll work on this again tonight and hopefully we can finish it up.
It works, but it seems to produce slightly coherent text. Needs to be investigated.