Nanoflow
Nanoflow copied to clipboard
unreadable generation results
Problems when use 2 A800 GPU serving LLaMA-3-8B
IIRC, no other change besides this. It's ok with one GPU. BTW what's the difference bwtween actual_gpu_num and gpu_num
We have changed whole codebase from c++ to python without lossing the performance in large batchsize scenarios. The configuration should be more clear now.