Shen Hao comments

Results 2 comments of


                                            Shen Hao

请问chatglm中tokenizer(question)的结果是question+[gMASK]+<sop>，但是如果我自己进行tokenizer，设定为[gMASK]+<sop>+question，两种方式是否都可以。

我想问的是chatglm1的chat函数，他的user字段为："[Round {{idx}}]\n\n问：{{content}}\n\n答："，在进行tokenizer(input)后，在结尾加上[gMASK]和sop。我想用自己的tokenizer函数，是否可以encode为[gMASK]+sop+input的形式。[gMASK]为生成模型的标志，sop为开始的标志。 - 对于chatglm3我知道需要{{content}}字段，同样的问题，我应该选择是[gMASK]+sop+{{content}}还是{{content}}[gMASK]+sop ![image](https://github.com/THUDM/ChatGLM3/assets/65658684/b0441ee7-66ff-4e2f-85d4-a427678f3747)

请问chatglm中tokenizer(question)的结果是question+[gMASK]+<sop>，但是如果我自己进行tokenizer，设定为[gMASK]+<sop>+question，两种方式是否都可以。

你好，这边比较确定了，对于chatglm-6b而言，应该是input+[gMASK]+sop的形式，而不是[gMASK]+sop+input。请问为什么两者的输出结果会如此不同。 ![image](https://github.com/THUDM/ChatGLM3/assets/65658684/82c492d8-e9fa-44ec-8a3a-57586d737e90)