Moses Hu comments

Results 22 comments of


                                            Moses Hu

关于alpaca训练的问题

> 是的，需要把output之前的内容的label都设置成-100；tokenizer.padding_side应该始终是'right' SFT数据的处理方法建议去看下Stanford Alpaca项目中构造数据集的过程。这是我的训练代码 ``` import copy import logging from dataclasses import dataclass, field from typing import Dict, Optional, Sequence import os import torch import transformers from peft import...

关于alpaca训练的问题

> 用peft+trainer训练时默认会保存全量权重，所以保存文件很大，处理方式可参考： > > https://github.com/ymcui/Chinese-LLaMA-Alpaca/blob/11e5d1fdb52aecf71c78ef6eae7c77e8b85d1537/scripts/run_clm_pt_with_peft.py#L559-L562 > > 我们的处理方式和Stanford Alpaca一样。#269 中也提到padding_side问题，你参考一下看是否有帮助。感谢回复，用left跟right没啥区别，我用padding left效果还是挺好的，都会识别到eos_token_id，#269 这里的padding side不会影响loss，我也遇到了同样的问题，主要是v100不支持load_int8_bit，https://github.com/tloen/alpaca-lora/blob/main/finetune.py 这里提供了masked with -100 以及不用mask -100的方式。vicuna和 Stanford Alpaca 都对output之前的内容做了mask -100

MMS transcribe audio.wav is error

> Have you pull to the latest fairseq-py master branch? fairseq version is 0.12.2?is it OK?

MMS transcribe audio.wav is error

> Have you pull to the latest fairseq-py master branch? I change the torch version to 2.0.1. got the same errors .but it run successfully on colab? so what is...

MMS transcribe audio.wav is error

> BTW, it should now be very simple to use MMS with `transformers`: > > See: > > * https://huggingface.co/docs/transformers/main/en/model_doc/mms > * [[MMS] Scaling Speech Technology to 1,000+ Languages |...

Using both LORA and FSDP results in error

I have same errors .Did you solve it?

[Question] 一个简单的概念性问题

> readme写基于transformer架构，我以为是编码器-解码器架构，后面说与LLaMA相似，那最后还是解码器架构，这块是我看错了，抱歉他们的readme写的有点模糊，明明跟Llama的结构差不多，为什么不直接用Llama 从scrach训练一版中文，真是搞不懂，后面llama直接用多语种训练一版，商业化之后，谁还用这些呢

如何去指定加载config文件？如何配置不同action的LLM配置？

> class Action(SerializationMixin, ContextMixin, BaseModel): > model_config = ConfigDict(arbitrary_types_allowed=True) > > name: str = "" > i_context: Union[ > dict, CodingContext, CodeSummarizeContext, TestingContext, RunCodeContext, CodePlanAndChangeContext, str, None > ] =...

如何去指定加载config文件？如何配置不同action的LLM配置？

我通过下面的方式也可以跑通代码，不知道这种做法会不会存在其他问题？为啥import Action或者Role的时候会去读~/.metagpt/config2.yaml文件呢？ ``` python import re from metagpt.roles import Role from metagpt.configs.llm_config import LLMConfig from metagpt.config2 import Config from metagpt.schema import Message class CustomAction(Action): PROMPT_TEMPLATE: str = """ Write a...

一些优化的建议

> 我建议改进结构? > > 比如用两个`深度求索R1`大模型作为一个架构师, 一个程序员. 使用MCP模型上下文协议改进通信, 在IDE与人类与大模型之间的交互. 还可以增强MCP协议, 比如有路由协议、路由算法、路由表等技术, 让四者(两个大模型、IDE、人类)的通信更健壮. 产品经理恰恰不需要, 但是review与调试debug可以递归地调用`双大模型`来解决, 从而减少系统资源的消耗. 我看看, 还需要一个数据库, 对项目进行预处理, 预先生成CFG、 DAG、 AST等图结构, 这样, agent代理就能准确定位需要增删改查的行数, 或debug. 进行多趟迭代地增量开发. 把这一切放到一个IDE内, 或多功能编辑器(比如vsCODE), 是希望代理不要每次都傻傻滴从头生成代码, 恰恰需要背后的大模型能深思熟虑, 每次只写1~2行代码, 好在现在思维链大模型已经能办到了,...