parser icon indicating copy to clipboard operation
parser copied to clipboard

训练新模型遇到问题

Open Hairmore opened this issue 1 year ago • 6 comments

python -u -m supar.cmds.dep.biaffine train -b -d 0 -c dep-biaffine-xlmr -p model --train train.conllx
--dev dev.conllx
--test test.conllx
-encoder=bert
--bert=xlm-roberta-large
--lr=5e-5
--lr-rate=20
--batch-size=500
--epoch=5
--update-steps=4 我的数据最开始是conllu格式,直接修改后缀为conllx。在运行这段代码时遇到: “File "supar\models\dep\biaffine\transform.py", line 422, in load for line in lines: UnicodeDecodeError: 'gbk' codec can't decode byte 0x94 in position 39: illegal multibyte sequence” 这个错误在我将文件名进行如是修改 train.conllx --> train.conllx.txt后消失. 开始进行Building the fields Building the model [2023-12-14 19:19:58 INFO] BiaffineDependencyModel( (encoder): TransformerEmbedding(xlm-roberta-large, n_layers=4, n_out=1024, stride=256, pooling=mean, pad_index=1, finetune=True) (encoder_dropout): Dropout(p=0.1, inplace=False) (arc_mlp_d): MLP(n_in=1024, n_out=500, dropout=0.33) (arc_mlp_h): MLP(n_in=1024, n_out=500, dropout=0.33) (rel_mlp_d): MLP(n_in=1024, n_out=100, dropout=0.33) (rel_mlp_h): MLP(n_in=1024, n_out=100, dropout=0.33) (arc_attn): Biaffine(n_in=500, bias_x=True) (rel_attn): Biaffine(n_in=100, n_out=2, bias_x=True, bias_y=True) (criterion): CrossEntropyLoss() ) 但是在caching the data步骤报错: 捕获2

不知道是不是文件格式的问题?请问可以请求一份您的训练数据进行测试吗? 我的数据格式为 捕获 十分感谢!!!!

Hairmore avatar Dec 14 '23 11:12 Hairmore

@Hairmore Hello,抱歉很晚回复你的问题,.conllx请尽量使用utf8编码,.txt文件有特殊用途,表示纯文本文件

yzhangcs avatar Dec 25 '23 13:12 yzhangcs

It's fine, No need for apologizing. Very grateful for your work and help!!!!! I have found the reason for this problem. It's because it's trained under Windows. I switched to Linux and this problem disappeared. Thx a lot !!!!!!!!!

Hairmore avatar Jan 10 '24 07:01 Hairmore

@Hairmore Hello,抱歉很晚回复你的问题,.conllx请尽量使用utf8编码,.txt文件有特殊用途,表示纯文本文件

Oh, the "txt" is to solve another weird problem. Under windows, if I have .conllu, that problem pops out. But by adding txt to the end, that problem is gone. Still don't know why

Sorry for using English, I haven't had Chinese input method on my Ubuntu yet.

Hairmore avatar Jan 10 '24 08:01 Hairmore

recommend to use conllu format files with .conllu/.conllx extension on Linux, which is my practice.

yzhangcs avatar Jan 10 '24 08:01 yzhangcs

recommend to use conllu format files with .conllu/.conllx extension on Linux, which is my practice.

Yes, under Linux with .conllu, everything went smoothly

Hairmore avatar Jan 10 '24 08:01 Hairmore

This issue is stale because it has been open for 30 days with no activity.

github-actions[bot] avatar Feb 11 '24 00:02 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Feb 25 '24 00:02 github-actions[bot]