ms-swift3 Suggestion Box
中文:
- [x] 弱化model_type的概念, 支持只使用<model_id_or_path>自动检测model_type (config.json).
- [x] template模块和dataset模块 拥抱messages数据集格式.
- [x] 去除generation-template的概念. 使用use_generate_template参数来控制获取base model需要的template, 以支持所有多模态模型的CPT.
- [x] preprocessor模块更加智能. 引入
AutoPreprocessor.
- [x] 支持训练重要功能的定制化, 采用插件化设计, 例如: loss_type, loss_scale, trainer, optimizer, callback, metric.
- [x] 更强的代码可读性, 层次化设计, 支持不同需求用户从代码、命令行、web-ui对ms-swift进行使用和再开发.
- [x] 重构文档和examples.
- [x] 统一的推理与部署接口, 采用类设计支持vllm/lmdeploy/pt/client.
- [x] pt支持batch
- [x] pt支持多卡/deepspeed
- [x] 多lora推理体验优化.
- [x] 优化encode/post_encode多模态模型训练机制
- [x] 提升大型预训练时的训练鲁棒性.
- [x] 优化对其他训练框架全参微调模型继续微调、推理、量化、部署的接入流程.
English:
- [x] De-emphasize the concept of model_type, supporting automatic detection of model_type using only <model_id_or_path> (config.json).
- [x] The template module and dataset module embrace the messages dataset format.
- [x] Remove the concept of generation-template. Use the use_generate_template parameter to control the template needed for acquiring the base model, in order to support the CPT of all multimodal models.
- [x] Make the preprocessor module smarter. Introduce
AutoPreprocessor.
- [x] Support customization of training functionalities with a plugin design, such as loss_type, loss_scale, trainer, optimizer, callback, metric.
- [x] Enhance code readability with a hierarchical design, allowing users with different needs to utilize and redevelop ms-swift through code, command line, and web UI.
- [x] Refactor documentation and examples.
- [x] Unified inference and deployment interface, utilizing class design to support vllm/lmdeploy/pt/client.
- [x] PT supports batch
- [x] PT supports multi-GPU/DeepSpeed
- [x] Optimization of multi-Lora inference experience.
- [x] Optimize the training mechanism of the encode/post_encode multimodal model.
- [x] Enhance the training robustness during large-scale pre-training.
- [x] Optimize the integration process for continued fine-tuning, inference, quantization, and deployment of full-parameter fine-tuned models with other training frameworks.
非常感谢贵组的辛苦工作! 针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。我看2.5版本已经支持了对于 MLLM 的 PT,我想这个功能对于做 MLLM 的 Post Pre-Train 是比较重要的。望采纳 :>
您好,非常感谢你们的开源工作!请问后续会有支持RAG的相关计划吗?
您好,非常感谢你们的开源工作!请问后续会有支持RAG的相关计划吗?
会有的,但应该不会在3.0加入哈,大概是3.1/3.2左右会加入
您好,可以增加自定义 evaluation 评价指标的相关接口吗
您好,可以增加自定义 evaluation 评价指标的相关接口吗
是的, 这是个重要的功能.
3.0会有对多卡npu适配的完整demo吗
hello,训练流程会有tp、pp...的支持吗
3.0会有对多卡npu适配的完整demo吗
这个看能不能借到卡😊
Allow to change datasets column names from HuggingFace/ModelScope to a swift supported format. Currently you have to download dataset from Huggingface, change column names and reupload it to use with swift.
Also swift's dataset preparation is very strict despite using --check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.
For example, It will not accept a dataset that has User message as a last message.
Another example is that it doesn't accept a dataset if it has repeating roles such as Assistant -> Assistant -> User.
It also complained about KeyError: 'conversations' in a dataset that didn't have a conversation column at all.
In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.
Also swift's dataset preparation is very strict despite using
--check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.For example, It will not accept a dataset that has User message as a last message. Another example is that it doesn't accept a dataset if it has repeating roles such as
Assistant -> Assistant -> User. It also complained aboutKeyError: 'conversations'in a dataset that didn't have a conversation column at all.In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.
Great suggestion, thank you!
hello,训练流程会有tp、pp...的支持吗
megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后
非常感谢贵组的辛苦工作! 针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。
非常感谢贵组的辛苦工作! 针对第三点,请问是否有想法加入类似 channel loss 的观察功能呢?即针对不同下游任务的数据集单独观察loss变化趋势。
好的 这个需求会加,是很常见的需求 感谢两位
please include the code for end to end fine tuning / pre training for audio lanugage models into your existing pipeline.If possible please integrate the moshi audio language model also EX: Llama 3.1 omni,
channel loss: related issue: https://github.com/modelscope/ms-swift/issues/2220
能否构建一个官方的npu版本的swfit镜像
多模态模型是否可以支持显存均匀分布到多卡
多模态模型是否可以支持显存均匀分布到多卡
deepspeed zero2/zero3是均匀的哇
请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?
请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?
你需要merge lora. 才会有config.json文件
请问在微调got-ocr2.0后,使用保存的微调模型时,报错model_type的问题,该如何解决?
你需要merge lora. 才会有config.json文件
已解决,谢谢! cd 进微调模型目录下,执行: swift merge-lora --ckpt_dir xxx
您好,当前执行多模态大模型微调时,设置的多轮对话在执行 infer 后保存的.jsonl文件中,response 只包含了最后一轮对话的结果,而 history 中包含的历史信息是label的,后续可以支持 infer 后保存多轮对话的全部结果吗
期待早日加入音频大模型的训练和微调,cosyvoice一类的
hello,训练流程会有tp、pp...的支持吗
megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后
今年有希望支持吗 😂
https://github.com/modelscope/ms-swift/pull/2030/
我使用pip安装的默认版本为2.6.1,指定版本安装也无法找到3.0版本,而源码master分支也为2.6.1,请问如何安装3.0版本ms-swift
如何download3.0版本的swift包,现在download的包的版本是2.6.1的
pip install git+https://github.com/modelscope/ms-swift.git
hello,训练流程会有tp、pp...的支持吗
megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后
今年有希望支持吗 😂
@Jintao-Huang 今年还有机会吗?