intel-extension-for-pytorch icon indicating copy to clipboard operation
intel-extension-for-pytorch copied to clipboard

About the number of model load times.

Open gukejun1 opened this issue 1 year ago • 1 comments

Describe the issue

Dear. Currently, each rank loads the complete data of the model and then performs tensor segmentation. For example, if there are eight ranks and eight models are loaded, the memory may be reused, causing a large waste of memory or even memory overflow. Is there a plan to update all ranks and load only one copy of model data?

gukejun1 avatar Nov 02 '23 01:11 gukejun1