moflow icon indicating copy to clipboard operation
moflow copied to clipboard

distributed单机多卡

Open zhangxiaofan-star opened this issue 1 year ago • 1 comments

您好,很感谢您这篇论文的工作,我收获了很多。 我在使用distributed方法进行单机多卡执行时,遇到了如下报错 1686277466595

您可以帮忙解答一下吗

zhangxiaofan-star avatar Jun 09 '23 02:06 zhangxiaofan-star

That's interesting. Are you using the DataParallel model for multiple GPUs? I have trained on up to 4 GPUs before with no issue other than needing to change some code from model.attr to model.module.attr.

However when I have done this, all the GPUs have been on the same HPC node.

Were you ever able to get this resolved?

parker-sornberger avatar Nov 15 '23 11:11 parker-sornberger