RocketQA icon indicating copy to clipboard operation
RocketQA copied to clipboard

训练为何会报这个错?你们的训练example.py

Open yangnianen opened this issue 2 years ago • 6 comments

训练文件 example.py 报错信息 Traceback (most recent call last): File "/opt/qa/RocketQA/examples/example.py", line 66, in train_cross_encoder('zh_dureader_ce_v2', './data/cross.train.tsv') File "/opt/qa/RocketQA/examples/example.py", line 12, in train_cross_encoder cross_encoder = rocketqa.load_model(model=base_model, use_cuda=True, device_id=5, batch_s File "/root/anaconda3/lib/python3.9/site-packages/rocketqa/rocketqa.py", line 122, in load_ encoder = CrossEncoder(**encoder_conf) File "/root/anaconda3/lib/python3.9/site-packages/rocketqa/encoder/cross_encoder.py", line place = dev_list[device_id] IndexError: list index out of range

yangnianen avatar Jun 20 '22 09:06 yangnianen

你好,训练不了,运行会报错。

yangnianen avatar Jun 21 '22 01:06 yangnianen

可以把device_id设成0试试

sfwydyc avatar Jun 21 '22 05:06 sfwydyc

我是cuda 11.2 ,cuDNN Version: 8.1 报错

Error Message Summary:

FatalError: Segmentation fault is detected by the operating system. [TimeInfo: *** Aborted at 1655791849 (unix time) try "date -d @1655791849" if you are using GNU date ***] [SignalInfo: *** SIGSEGV (@0x8497) received by PID 1872 (TID 0x7f275a7fc700) from PID 33943 ***]

yangnianen avatar Jun 21 '22 06:06 yangnianen

网上说,应该是这个模型还不支持cuDNN Version: 8 导致的。累

yangnianen avatar Jun 21 '22 06:06 yangnianen

请问有修复该问题的打算?

yangnianen avatar Jun 21 '22 09:06 yangnianen

用容器运行,并且把device_id改成0即可解决。

docker pull paddlepaddle/paddle:2.3.1-gpu-cuda11.2-cudnn8

Tlntin avatar Aug 02 '22 08:08 Tlntin