hejieyuan2005

Results 1 comments of hejieyuan2005

There is no need to perform draft model inference with multi-machine and multi-GPU. The draft model is relatively small, so single-GPU inference is sufficient. Single-GPU achieves the optimal performance, while...