ms-swift ms-swift3 Suggestion Box

中文:

[x] 弱化model_type的概念, 支持只使用<model_id_or_path>自动检测model_type (config.json).
[x] template模块和dataset模块拥抱messages数据集格式.
- [x] 去除generation-template的概念. 使用use_generate_template参数来控制获取base model需要的template, 以支持所有多模态模型的CPT.
- [x] preprocessor模块更加智能. 引入AutoPreprocessor.
[x] 支持训练重要功能的定制化, 采用插件化设计, 例如: loss_type, loss_scale, trainer, optimizer, callback, metric.
[x] 更强的代码可读性, 层次化设计, 支持不同需求用户从代码、命令行、web-ui对ms-swift进行使用和再开发.
[x] 重构文档和examples.
[x] 统一的推理与部署接口, 采用类设计支持vllm/lmdeploy/pt/client.
- [x] pt支持batch
- [x] pt支持多卡/deepspeed
- [x] 多lora推理体验优化.
[x] 优化encode/post_encode多模态模型训练机制
[x] 提升大型预训练时的训练鲁棒性.
[x] 优化对其他训练框架全参微调模型继续微调、推理、量化、部署的接入流程.

English:

[x] De-emphasize the concept of model_type, supporting automatic detection of model_type using only <model_id_or_path> (config.json).
[x] The template module and dataset module embrace the messages dataset format.
- [x] Remove the concept of generation-template. Use the use_generate_template parameter to control the template needed for acquiring the base model, in order to support the CPT of all multimodal models.
- [x] Make the preprocessor module smarter. Introduce AutoPreprocessor.
[x] Support customization of training functionalities with a plugin design, such as loss_type, loss_scale, trainer, optimizer, callback, metric.
[x] Enhance code readability with a hierarchical design, allowing users with different needs to utilize and redevelop ms-swift through code, command line, and web UI.
[x] Refactor documentation and examples.
[x] Unified inference and deployment interface, utilizing class design to support vllm/lmdeploy/pt/client.
- [x] PT supports batch
- [x] PT supports multi-GPU/DeepSpeed
- [x] Optimization of multi-Lora inference experience.
[x] Optimize the training mechanism of the encode/post_encode multimodal model.
[x] Enhance the training robustness during large-scale pre-training.
[x] Optimize the integration process for continued fine-tuning, inference, quantization, and deployment of full-parameter fine-tuned models with other training frameworks.

Oct 10 '24 02:10 Jintao-Huang

非常感谢贵组的辛苦工作！针对第三点，请问是否有想法加入类似 channel loss 的观察功能呢？即针对不同下游任务的数据集单独观察loss变化趋势。我看2.5版本已经支持了对于 MLLM 的 PT，我想这个功能对于做 MLLM 的 Post Pre-Train 是比较重要的。望采纳 :>

Oct 10 '24 02:10 bonre

您好，非常感谢你们的开源工作！请问后续会有支持RAG的相关计划吗？

Oct 10 '24 03:10 EdisonLeeeee

您好，非常感谢你们的开源工作！请问后续会有支持RAG的相关计划吗？

会有的，但应该不会在3.0加入哈，大概是3.1/3.2左右会加入

Oct 10 '24 03:10 Jintao-Huang

您好，可以增加自定义 evaluation 评价指标的相关接口吗

Oct 10 '24 04:10 Betty-J

您好，可以增加自定义 evaluation 评价指标的相关接口吗

是的, 这是个重要的功能.

Oct 10 '24 05:10 Jintao-Huang

3.0会有对多卡npu适配的完整demo吗

Oct 10 '24 06:10 liujiachang

hello，训练流程会有tp、pp...的支持吗

Oct 10 '24 07:10 firefighter-eric

3.0会有对多卡npu适配的完整demo吗

这个看能不能借到卡😊

Oct 11 '24 05:10 Jintao-Huang

Allow to change datasets column names from HuggingFace/ModelScope to a swift supported format. Currently you have to download dataset from Huggingface, change column names and reupload it to use with swift.

Oct 15 '24 10:10 Aunali321

Also swift's dataset preparation is very strict despite using --check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.

For example, It will not accept a dataset that has User message as a last message. Another example is that it doesn't accept a dataset if it has repeating roles such as Assistant -> Assistant -> User. It also complained about KeyError: 'conversations' in a dataset that didn't have a conversation column at all.

In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.

Oct 15 '24 10:10 Aunali321

Also swift's dataset preparation is very strict despite using --check_dataset_strategy none. It logs cryptic errors that does not explain what went wrong.

For example, It will not accept a dataset that has User message as a last message. Another example is that it doesn't accept a dataset if it has repeating roles such as Assistant -> Assistant -> User. It also complained about KeyError: 'conversations' in a dataset that didn't have a conversation column at all.

In a large dataset it's impossible to check for each row and fix it. There should be an option to continue despite this issues.

Great suggestion, thank you!

Oct 15 '24 15:10 Jintao-Huang

hello，训练流程会有tp、pp...的支持吗

megatron支持的优化会在ms-swift3.0重构后进行. 大概1-2个月后

Oct 15 '24 15:10 Jintao-Huang

非常感谢贵组的辛苦工作！针对第三点，请问是否有想法加入类似 channel loss 的观察功能呢？即针对不同下游任务的数据集单独观察loss变化趋势。

Oct 18 '24 04:10 TengboWang

非常感谢贵组的辛苦工作！针对第三点，请问是否有想法加入类似 channel loss 的观察功能呢？即针对不同下游任务的数据集单独观察loss变化趋势。

好的这个需求会加，是很常见的需求感谢两位

Oct 18 '24 04:10 Jintao-Huang

please include the code for end to end fine tuning / pre training for audio lanugage models into your existing pipeline.If possible please integrate the moshi audio language model also EX: Llama 3.1 omni,