hejieyuan2005
Results
1
comments of
hejieyuan2005
There is no need to perform draft model inference with multi-machine and multi-GPU. The draft model is relatively small, so single-GPU inference is sufficient. Single-GPU achieves the optimal performance, while...