SoftMaskedBert-PyTorch
SoftMaskedBert-PyTorch copied to clipboard
报错 解决不了 ,作者大大可以帮忙看看吗?
(gitabtion) F:\0code\gitabtion>python main.py --mode preproc
Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False
, loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8)
preprocessing...
Traceback (most recent call last):
File "main.py", line 99, in
我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题
这个仓库的数据处理脚本是有些问题,可以使用这个仓库 BertBasedCorrectionModels 处理数据后,再用本仓库训练
(gitabtion) F:\0code\gitabtion>python main.py --mode preproc Namespace(accumulate_grad_batches=16, batch_size=16, bert_checkpoint='bert-base-chinese', device=device(type='cpu'), epochs=10, gpu_index=0, hard_device='cpu', load_checkpoint=False , loss_weight=0.8, lr=0.0001, mode='preproc', model_save_path='checkpoint', warmup_epochs=8) preprocessing... Traceback (most recent call last): File "main.py", line 99, in main() File "main.py", line 63, in main preproc() File "F:\0code\gitabtion\src\data_processor.py", line 201, in preproc for item in read_data(get_abs_path('data')): File "F:\0code\gitabtion\src\data_processor.py", line 131, in read_data for line in f: UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 16: illegal multibyte sequence
我尝试修改了改了这个data_processed.py这个文件的129行,改为encoding未'utf-8'或者 ascii等都未成功,好难受,这是什么问题 兄弟最后问题怎么解决的?