Nurgl

Results 5 comments of Nurgl

try this parametrs ``` #SERVER: --listen #use listen --listen-port 80 ```

insert these lines into CMD_FLAGS.txt: #SERVER: --listen --listen-port 80

Same problem with CUDA 12.2 (535 nvidia drivers) CUDA 12.4 (550 nvidia drivers) on Ubuntu 22.04 (i load mixtral gptq). I change requirements.txt to v0.0.17 exllamav2 is working fine now:...

TensorRT and Triton Interface Server can reserve memory for several video cards at once and respond to several users in parallel. Is it possible to transfer this functionality to the...

it would be nice to make both a queue mode and a parallel processing mode