Chuan Meng

Results 5 comments of Chuan Meng

我仔细看了一下,没错误吧,gammar_r_l是[tagset_size, tagset_size],trainsitions也是[tagset_size, tagset_size]

I have the same concern.

@lywinged Hi, do you keep the pad_token_id 0 for both training and for batch-based inference?

What is the difference between optim="paged_adamw_8bit" and optim="paged_adamw_32bit"?

I am sorry for the fact that we did not specially design an interactive fashion in our released code. If you want to modify, I suggest you could transfer the...