StreamLoader library support device_map to allocate tensor to different device
🚀 Feature Description and Motivation
In the parameter list of the loading tensor in the StreamLoader library, device should be refined to device_map. And StreamLoader library should support device_map to allocate tensor to different device.
device_map (Dict[str, Union[int, str, torch.device]], optional) — A map that specifies where each submodule should go. It doesn’t need to be refined to each parameter/buffer name, once a given module name is inside, every submodule of it will be sent to the same device.
Refs: https://huggingface.co/docs/accelerate/v1.0.1/en/package_reference/utilities#accelerate.utils.load_state_dict.device_map
Use Case
No response
Proposed Solution
infer_auto_device_map functions in accelerate could be helpful.
This task may be a prerequisite for https://github.com/aibrix/aibrix/issues/403
Let's put the issue that about streamLoader or performance optimization into later versions, like v0.3.0