Chuan Meng
Chuan Meng
我仔细看了一下,没错误吧,gammar_r_l是[tagset_size, tagset_size],trainsitions也是[tagset_size, tagset_size]
I have the same concern.
@lywinged Hi, do you keep the pad_token_id 0 for both training and for batch-based inference?
What is the difference between optim="paged_adamw_8bit" and optim="paged_adamw_32bit"?
I am sorry for the fact that we did not specially design an interactive fashion in our released code. If you want to modify, I suggest you could transfer the...