PiPPy
PiPPy copied to clipboard
How to reduce memory costs when running on CPU
I running HF_inference.py on my CPU and it works well! It can successfully applying pipeline parallelism on CPU. However, when I applying pipeline parallelism, I found that each rank will load the whole model and it seems not necessary since each rank only performs a part of the model. There must be some ways can figure out this issue and I would love to solve this issue. It would be great if developers of TAU can give me some advice, we can discuss more about it if you have any idea. Thanks!