ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高???

Open duomicoding opened this issue 1 year ago • 7 comments

您好,请问下为啥Hybrid Parallel Plugin下TP显存比同配置下deepspeed要高???

duomicoding avatar Dec 17 '24 13:12 duomicoding

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: Is the TP memory under Hybrid Parallel Plugin higher than the deepspeed under the same configuration? ? ?

Hello, may I ask why the TP memory under Hybrid Parallel Plugin is higher than the deepspeed under the same configuration? ? ?

Issues-translate-bot avatar Dec 17 '24 13:12 Issues-translate-bot

好像是显存碎片造成?而且很严重,请问下有对应的优化改进措施吗?

duomicoding avatar Dec 17 '24 13:12 duomicoding

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


It seems to be caused by memory fragmentation? And it’s very serious. Are there any corresponding optimization and improvement measures?

Issues-translate-bot avatar Dec 17 '24 13:12 Issues-translate-bot

Deepspeed zero-3是完全切分权重而TP并不完全切分(例如非Linear/Embedding层)。当Activation较小时这种情况有可能发生,请提供更详细的信息

ver217 avatar Feb 20 '25 04:02 ver217

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Deepspeed zero-3 is a complete slicing of weights while TP is not fully slicing (e.g., non-Linear/Embedding layers). This may occur when Activation is small, please provide more detailed information

Issues-translate-bot avatar Feb 20 '25 04:02 Issues-translate-bot

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Your code efficiency can be improved by 10 times! 🚀 https://code.mayoubang.cn Free AI code tool, if you don’t believe it, try it!

Issues-translate-bot avatar Apr 13 '25 04:04 Issues-translate-bot