rooooc

Results 2 issues of rooooc

when I try to run CUDA_VISIBLE_DEVICES=0,1,2,3 text-generation-launcher --port 6634 --model-id /models/ --max-concurrent-requests 128 --max-input-length 64--max-total-tokens 128 --max-batch-prefill-tokens 128 --cuda-memory-fraction 0.95. It says torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate...

question

Hi, i notice a problem. lets say for same query, when batch size is 2, it will result to different calculate output compare to batch size is 1. Thought the...