Shawn Xu
Results
2
comments of
Shawn Xu
I checked the wudao dataest and found there are some irregular question marks in the text. Is this the cause of the problem? 
OK. The reason is that the trained tokenizer encounter some unseen tokens while pretraining such as "岿". Maybe the vocabulary of GLM10bchinese is not big enough.