intel-extension-for-pytorch
intel-extension-for-pytorch copied to clipboard
About the number of model load times.
Describe the issue
Dear. Currently, each rank loads the complete data of the model and then performs tensor segmentation. For example, if there are eight ranks and eight models are loaded, the memory may be reused, causing a large waste of memory or even memory overflow. Is there a plan to update all ranks and load only one copy of model data?