ECSpell 代码无法跑通

代码无法跑通

Open LiyiLily opened this issue 2 years ago • 1 comments

您好，我配置好环境后，尝试运行，并根据您回答其他人的问题尝试解决，但仍然无法正常运行代码。具体情况如下：

根据https://github.com/ShannonAI/glyce 配置好 glyce
运行Code/tokenize_sequence.py 生成Data/traintest/sim/glyce/SIGHAN中的数据
下载google driver中的模型参数文件夹ecspell_checkpoint
script.sh 文件中修改地址 CHECKPOINT=./ecspell_checkpoint/ecspell
运行scripe.sh 报错找不到checkpoint-19500，于是将train_baseline.py line304 修改为 logger.info(f"{os.path.join(args.load_pretrain_checkpoint, 'results', 'checkpoint')}")
再次运行scripe.sh，报错如下 File "/home/admin/code/spelling_check/ECSpell/Code/data_processor.py", line 71, in call return BatchEncoding(batch_outputs, tensor_type="pt") File "/home/admin/miniconda3/envs/ecspell/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 209, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/home/admin/miniconda3/envs/ecspell/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 724, in convert_to_tensors "Unable to create tensor, you should probably activate truncation and/or padding " ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

经过排查发现训练数据没有padding，因此在IterableDatasetShard后，数据为空。请问是不是数据使用或者哪里出了问题呢？

希望您可以抽空检查一下代码，下载下来试试能不能跑通，或者提供一份详细的readme文件来指导操作。

Jul 18 '22 09:07 LiyiLily

很抱歉给您带来这么大的困扰，最近修改相关稿件再加上另外投稿确实事情比较多，预计会在1-2周内更新仓库。另外也希望提供更完整的报错信息，从目前提示来看，暂时无法定位错误来源。

Jul 21 '22 12:07 aopolin-lv