ChatGLM-6B icon indicating copy to clipboard operation
ChatGLM-6B copied to clipboard

推理结果文件中有<img_-100>

Open sanwei111 opened this issue 2 years ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Current Behavior

{"labels": "<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>xxxxxxxxxxxxxxxxxxxx", "predict": "xxxxxxxxxxxxxxxxxx"}

<image_-100>是什么东西?

Expected Behavior

No response

Steps To Reproduce

ef

Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
efg

Anything else?

No response

sanwei111 avatar Jun 02 '23 06:06 sanwei111

捞一条,我也遇到了,使用v1.1.0模型的labels会出现这些字符,有大佬说改用v0.1.0模型就可以了,可能是v1.1.0的tokenizer的问题,求官方回答一下这些字符是否影响finetune和推理结果

yqkgithup avatar Jun 06 '23 03:06 yqkgithup

我也遇到这个问题。求官方解答。

huilinbo avatar Jun 06 '23 08:06 huilinbo

同问

lhy101 avatar Jun 24 '23 10:06 lhy101

修改tokenization_chatglm.py,最后一行:


def _decode(
            self,
            token_ids: Union[int, List[int]],
            **kwargs
    ) -> str:
        if isinstance(token_ids, int):
            token_ids = [token_ids]
        if len(token_ids) == 0:
            return ""
        if self.pad_token_id in token_ids:  # remove pad
            token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
        return self.sp_tokenizer.decode(token_ids)

这些小细节估计原作者也懒得管了,还是得自己看代码。

xiningnlp avatar Jun 26 '23 02:06 xiningnlp