ChatGLM-6B
ChatGLM-6B copied to clipboard
推理结果文件中有<img_-100>
Is there an existing issue for this?
- [X] I have searched the existing issues
Current Behavior
{"labels": "<image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100><image_-100>xxxxxxxxxxxxxxxxxxxx", "predict": "xxxxxxxxxxxxxxxxxx"}
<image_-100>是什么东西?
Expected Behavior
No response
Steps To Reproduce
ef
Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA Support (`python -c "import torch; print(torch.cuda.is_available())"`) :
efg
Anything else?
No response
捞一条,我也遇到了,使用v1.1.0模型的labels会出现这些字符,有大佬说改用v0.1.0模型就可以了,可能是v1.1.0的tokenizer的问题,求官方回答一下这些字符是否影响finetune和推理结果
我也遇到这个问题。求官方解答。
同问
修改tokenization_chatglm.py,最后一行:
def _decode(
self,
token_ids: Union[int, List[int]],
**kwargs
) -> str:
if isinstance(token_ids, int):
token_ids = [token_ids]
if len(token_ids) == 0:
return ""
if self.pad_token_id in token_ids: # remove pad
token_ids = list(filter((self.pad_token_id).__ne__, token_ids))
return self.sp_tokenizer.decode(token_ids)
这些小细节估计原作者也懒得管了,还是得自己看代码。