rooooc issues

Repositories
Issues
Comments

Results 2 issues of


                                            rooooc

can't start server with small --max-total-tokens. But works fine with big stting

when I try to run CUDA_VISIBLE_DEVICES=0,1,2,3 text-generation-launcher --port 6634 --model-id /models/ --max-concurrent-requests 128 --max-input-length 64--max-total-tokens 128 --max-batch-prefill-tokens 128 --cuda-memory-fraction 0.95. It says torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...

question

Different batch size results to different result for gptq_marlin_gemm kernel.

Hi, i notice a problem. lets say for same query, when batch size is 2, it will result to different calculate output compare to batch size is 1. Thought the...