unreadable generation results

Open Fonsifa opened this issue 1 year ago • 1 comments

Problems when use 2 A800 GPU serving LLaMA-3-8B

IIRC, no other change besides this. It's ok with one GPU. BTW what's the difference bwtween actual_gpu_num and gpu_num

Nov 26 '24 12:11 Fonsifa

We have changed whole codebase from c++ to python without lossing the performance in large batchsize scenarios. The configuration should be more clear now.

Aug 11 '25 20:08 Wazrrr