ChatGLM-Tuning icon indicating copy to clipboard operation
ChatGLM-Tuning copied to clipboard

chore: two minor fixes

Open WindRunnerMax opened this issue 1 year ago • 0 comments

两处小修正

  1. 去掉了READMEpython tokenize_dataset_rows.py参数的多余空格,多余空格使转义符转义到空格上了。
  2. cover_alpaca2jsonl.pyjson.dumps时,中文字符会被转义,生成的jsonl文件可读性略差,当然json.loads会转义回来不影响功能。
>>> import json
>>> print(json.dumps({ "intro": "测试"}))
{"intro": "\u6d4b\u8bd5"}
>>> print(json.dumps({ "intro": "测试" }, ensure_ascii=False))
{"intro": "测试"}
>>> print(json.loads('{"intro": "\u6d4b\u8bd5"}'))
{'intro': '测试'}
>>> print(json.loads('{"intro": "测试"}'))
{'intro': '测试'}

WindRunnerMax avatar Apr 09 '23 13:04 WindRunnerMax