MEAN icon indicating copy to clipboard operation
MEAN copied to clipboard

适配最新的pytorch 2.7.0与CUDA12.8

Open yx0516 opened this issue 6 months ago • 3 comments

您好,

我在环境: pytorch 2.7.0与CUDA12.8 中运行Optimize功能时,示例数据报错:

Traceback (most recent call last): File "/data/PRG/tools/Biomolecules/apps/MEAN/ita_generate.py", line 131, in main(args) File "/data/PRG/tools/Biomolecules/apps/MEAN/ita_generate.py", line 93, in main ppls, seqs, xs, true_xs, aligned = model.infer(batch, device, greedy=False) File "/data/PRG/tools/Biomolecules/apps/MEAN/models/MCAttGNN/mc_att_model.py", line 446, in infer snll_all, pred_S, pred_X, true_X, cdr_range = self.generate( File "/data/PRG/tools/Biomolecules/apps/MEAN/models/MCAttGNN/mc_att_model.py", line 439, in generate S[mask] = torch.multinomial(prob, num_samples=1).squeeze() RuntimeError: CUDA error: device-side assert triggered

运行命令是: python ita_generate.py --pdb 1ic7.pdb --heavy_chain H --light_chain L --n_samples 100

请问mc_att_model.py中的代码 line 439 如何修改才能适配最新的pytorch版本? 在老版pytorch环境中是正常执行没有报错。

请帮忙解答,谢谢@kxz18 @GrittyChen

yx0516 avatar Jun 04 '25 09:06 yx0516

是否可以先检查下prob是否正常,只看这行代码的话,它的行为看起来跟torch版本不太相关

kxz18 avatar Jun 04 '25 12:06 kxz18

请问应该如何检查prob是否正常?我打印prob.shape输出是 torch.Size([700, 27]),看上去是正常的?

yx0516 avatar Jun 05 '25 08:06 yx0516

内容是否正常,比如是否全是0或者nan之类的,因为也有可能是前序的算子出了问题,导致得到了不对的prob,才引起的报错。

kxz18 avatar Jun 05 '25 08:06 kxz18