ColossalAI Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高？？？

您好，请问下为啥Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高？？？

Dec 17 '24 13:12 duomicoding

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Title: Is the TP memory under Hybrid Parallel Plugin higher than the deepspeed under the same configuration? ? ?

Hello, may I ask why the TP memory under Hybrid Parallel Plugin is higher than the deepspeed under the same configuration? ? ?

Dec 17 '24 13:12 Issues-translate-bot

好像是显存碎片造成？而且很严重，请问下有对应的优化改进措施吗？

Dec 17 '24 13:12 duomicoding

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

It seems to be caused by memory fragmentation? And it’s very serious. Are there any corresponding optimization and improvement measures?

Dec 17 '24 13:12 Issues-translate-bot

Deepspeed zero-3是完全切分权重而TP并不完全切分（例如非Linear/Embedding层）。当Activation较小时这种情况有可能发生，请提供更详细的信息

Feb 20 '25 04:02 ver217

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Deepspeed zero-3 is a complete slicing of weights while TP is not fully slicing (e.g., non-Linear/Embedding layers). This may occur when Activation is small, please provide more detailed information

Feb 20 '25 04:02 Issues-translate-bot

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿

Your code efficiency can be improved by 10 times! 🚀 https://code.mayoubang.cn Free AI code tool, if you don’t believe it, try it!

Apr 13 '25 04:04 Issues-translate-bot