Nanoflow icon indicating copy to clipboard operation
Nanoflow copied to clipboard

Does this method have the same benefit when tp=1 or tp=2?

Open CSEEduanyu opened this issue 1 year ago • 1 comments

Does this method have the same benefit when tp=1 or tp=2?

CSEEduanyu avatar Aug 29 '24 14:08 CSEEduanyu

When TP is small, almost all the available GPU memory is occupied by model weights. Therefore, the request batch size is reduced, and thus, the batching effect is less significant. Therefore, reducing TP would greatly harm the system's performance.

serendipity-zk avatar Aug 29 '24 17:08 serendipity-zk