intel-extension-for-pytorch About the number of model load times.

About the number of model load times.

Open gukejun1 opened this issue 1 year ago • 1 comments

Describe the issue

Dear. Currently, each rank loads the complete data of the model and then performs tensor segmentation. For example, if there are eight ranks and eight models are loaded, the memory may be reused, causing a large waste of memory or even memory overflow. Is there a plan to update all ranks and load only one copy of model data?

Nov 02 '23 01:11 gukejun1

intel-extension-for-pytorch intel-extension-for-pytorch copied to clipboard

About the number of model load times.

Describe the issue

intel-extension-for-pytorch
intel-extension-for-pytorch copied to clipboard