RuntimeError: The expanded size of the tensor (63) must match the existing size (64) at non-singleton dimension 0
I've encountered the following question during executing the predict.py with RecoverSAT model:
Traceback (most recent call last):
File "predict.py", line 93, in <module>
main()
File "predict.py", line 83, in main
results = translator.translate(input_data)
File "/code/translator.py", line 39, in translate
batch_pred = self.recover_nat_translate_batch(batch)
File "/code/translator.py", line 124, in recover_nat_translate_batch
position = position.expand(cur_bsz, -1)
RuntimeError: The expanded size of the tensor (63) must match the existing size (64) at non-singleton dimension 0
The command I used:
python3 predict.py \
--model_path $MODEL_DIR/$CKPT.ckp \
--input_file $TEST_DATA_PREFIX.en \
--output_file $MEASURE_DIR/test.en.pred.$CKPT \
--vocab_path $VOCAB_FILE > $LOG_DIR/decode.$STYPE.$CKPT.log 2>&1
I'm not sure what has happened.
您好!我训练到1000步后出现如此的错误这是怎么回事儿呢?
我的命令如下:
CUDA_VISIBLE_DEVICES='1','4','5','6','7' python train.py --model_name RecoverSAT --segment_num 2 --dataset IWSLT16 --init_encoder_path ./checkpoint-token-zh-ti/b4-epoch-208-batch-441.ckp --train_src_file ../corpus/token/train.token.zh.shuf --train_tgt_file ../corpus/token/train.token.ti.shuf --valid_src_file ../corpus/token/dev.token.zh --valid_tgt_file ../corpus/token/dev.token.ti --vocab_path ../corpus/token/vocab.token.zhti.txt
出现的错误如下:
01/19/2021 08:48:31 - INFO - main - Epoch=0 batch=1000 step=1000 loss=5.383245
/pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:19: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead.
Traceback (most recent call last):
File "train.py", line 391, in
你应该是用了新版本的pytorch,对类型和操作加强了限制,masked_fill_这个操作已经不支持uint8类型的变量了,只支持bool型变量。你可以回退旧版本的pytorch,或者修改变量类型(直接改原变量类型,或者加类型转换),我之前遇到过这个问题,修改变量类型后就可以了。详细的信息你可以看看pytorch的文档。
谢谢您呀!
Maybe should change the size of position_base to (1 * segment_num) rather than (batch_size * segment_num) when initializing. It's because we can't expend the tensor with the size in dimension 0 larger than 1.