Does this method have the same benefit when tp=1 or tp=2?

Open CSEEduanyu opened this issue 1 year ago • 1 comments

Aug 29 '24 14:08 CSEEduanyu

When TP is small, almost all the available GPU memory is occupied by model weights. Therefore, the request batch size is reduced, and thus, the batching effect is less significant. Therefore, reducing TP would greatly harm the system's performance.

Aug 29 '24 17:08 serendipity-zk