incubator-uniffle
incubator-uniffle copied to clipboard
[Performance Optimization] Multiple channels when getting shuffle data in client side
Motivation
Now the executor only will use the single TCP connection with the specified shuffle server, so when multiple tasks are running concurrently, it will share this channel. Maybe it will reduce the whole throughput.
Do we have any plan to introduce extra config to allow user to create more channels in client side?
Maybe we should do some performance test to prove this improvement effective. The update will be included in this ticket.
@zuston The current implementation limit the number of connection, because don't want too many connection established between client and shuffle server. We also plan to improve this by using netty instead of Grpc, not only about throughput but also reduce unnecessary serialization in Grpc.
Glad to hear this. From the flame graph, due to extra memory-copy, it cost too much time in shuffle server side.
If using the netty to directly manipulate shuffle data bytebuf in off-heap, it may get better performance.
Besides, do we need to w/r shuffle data to files using nio to match with the netty? Please let me know whether i'm wrong.
@zuston The current implementation limit the number of connection, because don't want too many connection established between client and shuffle server. We also plan to improve this by using netty instead of Grpc, not only about throughput but also reduce unnecessary serialization in Grpc.
Could we put this into the 0.7 version plan?
@zuston The current implementation limit the number of connection, because don't want too many connection established between client and shuffle server. We also plan to improve this by using netty instead of Grpc, not only about throughput but also reduce unnecessary serialization in Grpc.
Could we put this into the 0.7 version plan?
Yes, I think so
So do we have a roadmap of version-release/features in github? @jerqi
So do we have a roadmap of version-release/features in github? @jerqi
Not yet. 0.6 version's main feature is to support K8S.