Open-Sora icon indicating copy to clipboard operation
Open-Sora copied to clipboard

Out-of-memory for default config.

Open chehx opened this issue 1 year ago • 2 comments
trafficstars

Many thanks for open-sourcing this great project.

Currently, I meet the out-of-memory error when training.

I use the default training config in stage3.py and I have 2 A100 80G.

However, it raises the error, but in report 1.1, it says the default config is for 80G memory usage.

Currently, when I use 480p with 48 frames, it takes around 73GB.

chehx avatar May 17 '24 12:05 chehx

I read report 1.1 and it does not state that only 80G of memory is required for training. Where did you see that?

JamesTensor avatar May 19 '24 00:05 JamesTensor

I read report 1.1 and it does not state that only 80G of memory is required for training. Where did you see that?

https://github.com/hpcaitech/Open-Sora/issues/344#issuecomment-2102359347

Honestly, I saw this response. Did I misunderstand something?

chehx avatar May 20 '24 04:05 chehx

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar May 28 '24 01:05 github-actions[bot]

I found the problem!

When I wanna use the pre-trained weight via huggingface, it will load the config file:

https://huggingface.co/hpcai-tech/OpenSora-STDiT-v2-stage3/blob/main/config.json

where, the

"enable_flash_attn": false, "enable_layernorm_kernel": false,

is forbidden!

chehx avatar May 30 '24 12:05 chehx

This issue is stale because it has been open for 7 days with no activity.

github-actions[bot] avatar Jun 07 '24 01:06 github-actions[bot]

I am gonna close this issue since it appears to have been resolved by the question owner.

JThh avatar Jun 10 '24 16:06 JThh