Chinese-LLaMA-Alpaca
Chinese-LLaMA-Alpaca copied to clipboard
merge_tokenizers合并词表后,fast tokenizers与tokenizers加载的字典长度不同。
使用merge_tokenizers.py合并中文词典后,非fast 字典长度约为6W token,fast则为4W6左右。是bug吗?
merge_tokenizers的代码没有在fast tokenizers上调试过。 建议使用普通的tokenizer
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your consideration.