HappyNews

Results 7 comments of HappyNews

Have you solve your problems? I came up with the same error when using deepspeed. Solutions provided above didn't work at all. :(

In addition, in `setup_env.py` file I just modifed the `gen_code()` method to make it do the same thing as `get_model_name() == "BitNet-b1.58-2B-4T"`

> [@LiuZhihhxx](https://github.com/LiuZhihhxx) Have you solved this problem? Not yet. It seems an essential step for downstream application.

I am trying SFT for my downstream task. I think `Trainer` from `trl` may work.

128k的词表,跟llama3一样大的,中文分词应该没啥问题。但是这个模型没怎么在中文语料上训练过,需要自己微调对齐一下。

没在中文上做过预训练,图中明显存在中英文掺杂的问题。如果不添加“使用中文回复”,该问题更明显。

可以先从server获取全部tool list,然后自定义逻辑筛选工具后,手动传入。不是很优雅,但目前暂时没看到别的方法。获取工具可以参考以下代码: ```python import asyncio from mcp.client.sse import sse_client from mcp.client.session import ClientSession url = 'http://localhost:7764/sse' async def get_tools(): async with sse_client(url) as streams: # 换成对应的MCP地址 async with ClientSession(*streams) as...