PiPPy How to reduce memory costs when running on CPU

How to reduce memory costs when running on CPU

Open jiqing-feng opened this issue 2 years ago • 0 comments

I running HF_inference.py on my CPU and it works well! It can successfully applying pipeline parallelism on CPU. However, when I applying pipeline parallelism, I found that each rank will load the whole model and it seems not necessary since each rank only performs a part of the model. There must be some ways can figure out this issue and I would love to solve this issue. It would be great if developers of TAU can give me some advice, we can discuss more about it if you have any idea. Thanks!

Jan 17 '23 07:01 jiqing-feng

PiPPy PiPPy copied to clipboard

How to reduce memory costs when running on CPU

PiPPy
PiPPy copied to clipboard