GLM-4 icon indicating copy to clipboard operation
GLM-4 copied to clipboard

用给的示示例数据tools的数据微调,后面自动多了一个Tools:None,数据处理报异常

Open mudeguo opened this issue 4 months ago • 2 comments

System Info / 系統信息

python3.12,transformer 43;gpu 2080Ti22g*2

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • [X] The official example scripts / 官方的示例脚本
  • [ ] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

batchjed_conv: 50 conv: [{'role': 'system', 'content': '', 'tools': [{'type': 'function', 'function': {'name': 'get_recommended_books', 'description': "Get recommended books based on user's interests1", 'parameters': {'type': 'object', 'properties': {'interests': {'type': 'array', 'items': {'type': 'string'}, 'description': 'The interests to recommend books for'}}, 'required': ['interests']}}}]}, {'role': 'user', 'content': 'Hi, I am looking for some book recommendations. I am interested in history and science fiction.', 'tools': None}, {'role': 'assistant', 'content': '{"name": "get_recommended_books", "arguments": {"interests": ["history", "science fiction"]}}', 'tools': None}, {'role': 'observation', 'content': '{"books": ["Sapiens: A Brief History of Humankind by Yuval Noah Harari", "A Brief History of Time by Stephen Hawking", "Dune by Frank Herbert", "The Martian by Andy Weir"]}', 'tools': None}, {'role': 'assistant', 'content': 'Based on your interests in history and science fiction, I would recommend the following books: "Sapiens: A Brief History of Humankind" by Yuval Noah Harari, "A Brief History of Time" by Stephen Hawking, "Dune" by Frank Herbert, and "The Martian" by Andy Weir.', 'tools': None}] Map: 0%| | 0/50 [00:00<?, ? examples/s] [rank1]: ╭────────────────────────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ──────────────────────────────────────────────────────────────────────────────────────────────────╮ [rank1]: │ /data/projects/GLM-4/finetune_demo/finetune.py:419 in main │ [rank1]: │ │ [rank1]: │ 416 │ tokenizer, model = load_tokenizer_and_model(model_dir, peft_config=ft_config.peft_co │ [rank1]: │ 417 │ data_manager = DataManager(data_dir, ft_config.data_config) │ [rank1]: │ 418 │ │ [rank1]: │ ❱ 419 │ train_dataset = data_manager.get_dataset( │ [rank1]: │ 420 │ │ Split.TRAIN, │ [rank1]: │ 421 │ │ functools.partial( │ [rank1]: │ 422 │ │ │ process_batch, │ [rank1]: │

Expected behavior / 期待表现

希望能提供tools微调时的jsonl文件,能够跑通不报错。 trouble

mudeguo avatar Oct 06 '24 02:10 mudeguo