aibrix
aibrix copied to clipboard
StreamLoader splits tensors more evenly
🚀 Feature Description and Motivation
The current way of loading tensors in StreamLoader is tensor by tensor, and the total size distribution of tensors processed in each thread is not evenly distributed, which can cause the largest thread to become a bottleneck in speed. If the tensor size pulled by each thread is similar, it can ensure an increase in bandwidth usage.
Use Case
No response
Proposed Solution
No response
The final form should be https://github.com/aibrix/aibrix/issues/401