aibrix
aibrix copied to clipboard
StreamLoader supports each TP process to only read the corresponding part of the model file
🚀 Feature Description and Motivation
Currently, each TP (Tensor Parallism) process in StreamLoader reads all model files, resulting in duplicate file transfers and reads, which reduces the overall loading speed. If StreamLoader supports each TP process to only read the corresponding part of the model file, it will reduce the additional overhead of this part.
Use Case
No response
Proposed Solution
No response
Let's put the issue that about streamLoader or performance optimization into later versions, like v0.3.0