luchenyu comments

Repositories
Issues
Comments

Results 1 comments of


                                            luchenyu

[BUG] Error "exits with return code -7" when finetuning FLANT5-xxl on 8x A100

Setting the shm-size to a large number instead of default 64MB when creating docker container solves the problem in my case. It appears that multi-gpu training relies on the shared...