ECSpell icon indicating copy to clipboard operation
ECSpell copied to clipboard

代码无法跑通

Open LiyiLily opened this issue 2 years ago • 1 comments

您好,我配置好环境后,尝试运行,并根据您回答其他人的问题尝试解决,但仍然无法正常运行代码。具体情况如下:

  1. 根据https://github.com/ShannonAI/glyce 配置好 glyce
  2. 运行Code/tokenize_sequence.py 生成Data/traintest/sim/glyce/SIGHAN中的数据
  3. 下载google driver中的模型参数文件夹ecspell_checkpoint
  4. script.sh 文件中修改地址 CHECKPOINT=./ecspell_checkpoint/ecspell
  5. 运行scripe.sh 报错找不到checkpoint-19500,于是将train_baseline.py line304 修改为 logger.info(f"{os.path.join(args.load_pretrain_checkpoint, 'results', 'checkpoint')}")
  6. 再次运行scripe.sh,报错如下 File "/home/admin/code/spelling_check/ECSpell/Code/data_processor.py", line 71, in call return BatchEncoding(batch_outputs, tensor_type="pt") File "/home/admin/miniconda3/envs/ecspell/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 209, in init self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis) File "/home/admin/miniconda3/envs/ecspell/lib/python3.7/site-packages/transformers/tokenization_utils_base.py", line 724, in convert_to_tensors "Unable to create tensor, you should probably activate truncation and/or padding " ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length.

经过排查发现训练数据没有padding,因此在IterableDatasetShard后,数据为空。请问是不是数据使用或者哪里出了问题呢?

希望您可以抽空检查一下代码,下载下来试试能不能跑通,或者提供一份详细的readme文件来指导操作。

LiyiLily avatar Jul 18 '22 09:07 LiyiLily

很抱歉给您带来这么大的困扰,最近修改相关稿件再加上另外投稿确实事情比较多,预计会在1-2周内更新仓库。 另外也希望提供更完整的报错信息,从目前提示来看,暂时无法定位错误来源。

aopolin-lv avatar Jul 21 '22 12:07 aopolin-lv