ChatGLM-6B 推理结果文件中有<img

Is there an existing issue for this?

[X] I have searched the existing issues

Current Behavior

{"labels": "<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>xxxxxxxxxxxxxxxxxxxx", "predict": "xxxxxxxxxxxxxxxxxx"}

<image_-100>是什么东西？

Expected Behavior

No response

Steps To Reproduce

ef

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
efg

Anything else?

No response

Jun 02 '23 06:06 sanwei111

捞一条，我也遇到了，使用v1.1.0模型的labels会出现这些字符，有大佬说改用v0.1.0模型就可以了，可能是v1.1.0的tokenizer的问题，求官方回答一下这些字符是否影响finetune和推理结果

Jun 06 '23 03:06 yqkgithup

我也遇到这个问题。求官方解答。

Jun 06 '23 08:06 huilinbo

同问

Jun 24 '23 10:06 lhy101

修改tokenization_chatglm.py，最后一行：


def _decode(
            self,
            token_ids: Union[int, List[int]],
            **kwargs
    ) -> str:
        if isinstance(token_ids, int):
            token_ids = [token_ids]
        if len(token_ids) == 0:
            return ""
        if self.pad_token_id in token_ids:  # remove pad
            token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
        return self.sp_tokenizer.decode(token_ids)

这些小细节估计原作者也懒得管了，还是得自己看代码。

Jun 26 '23 02:06 xiningnlp

推理结果文件中有<img_-100>

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Anything else?