Nurgl
Nurgl
try this parametrs ``` #SERVER: --listen #use listen --listen-port 80 ```
insert these lines into CMD_FLAGS.txt: #SERVER: --listen --listen-port 80
Same problem with CUDA 12.2 (535 nvidia drivers) CUDA 12.4 (550 nvidia drivers) on Ubuntu 22.04 (i load mixtral gptq). I change requirements.txt to v0.0.17 exllamav2 is working fine now:...
TensorRT and Triton Interface Server can reserve memory for several video cards at once and respond to several users in parallel. Is it possible to transfer this functionality to the...
it would be nice to make both a queue mode and a parallel processing mode