ColossalAI icon indicating copy to clipboard operation
ColossalAI copied to clipboard

[BUG]: 使用 gemini,必须是2的幂的卡数,不然出现 assert chunk_size % self.pg_size == 0

Open 1024er opened this issue 2 years ago • 3 comments

🐛 Describe the bug

使用 gemini,必须是2的幂的卡数,不然出现 assert chunk_size % self.pg_size == 0

打印 chunk_size 是 40MB

Environment

多台 8x80G A100,使用最新的code

1024er avatar Feb 20 '23 10:02 1024er

Bot detected the issue body's language is not English, translate it automatically. 👯👭🏻🧑‍🤝‍🧑👫🧑🏿‍🤝‍🧑🏻👩🏾‍🤝‍👨🏿👬🏿


Title: [BUG]: using gemini, the number of cards must be a power of 2, otherwise assert chunk_size % self.pg_size == 0 will appear

🐛 Describe the bug

When using gemini, the number of cards must be a power of 2, otherwise assert chunk_size % self.pg_size == 0 will appear

Print chunk_size is 40MB

Environment

Multiple 8x80G A100, using the latest code

Issues-translate-bot avatar Feb 20 '23 10:02 Issues-translate-bot

@ver217 and @1SAA , can you take a look at this issue. I thought Gemini has implemented padding for chunks, so that 8 elements over 3 devices will be divided as (3, 3, 2) where 2 will be padded to 3.

FrankLeeeee avatar Feb 21 '23 03:02 FrankLeeeee

I'll fix this soon.

1SAA avatar Feb 21 '23 07:02 1SAA