lingq1

Results 10 comments of lingq1

> 这个case考验的已经不是function calling能力,而是模型的reasoning能力了。。cc @zhenruzhang > > 我测试dashscope的qwen2-72b-instruct,似乎加一句“请先列出同时有Origin, Tcpping, Exchange三个标签的主机,再回答。”能正确回答是“北京移动100移动”。但这类需求我理解写代码处理要远比问大模型来得高效。。大模型更擅长把混乱的文本输入处理成结构化的数据,逻辑推理方面的表现通常都不稳定。 感谢大佬回复。实际业务逻辑比这个例子复杂一些,然后这个业务最后是给运营等不太会写代码的人员使用。除了修改prompt外有什么好的建议增加大模型推理能力吗,比如修改什么配置。微调等,这个微调有效果吗?

> Hi @lingq1, I think it should work as long as we have sufficient GPU RAM. If not, we can run the pruning on CPU. ths

> Congrats on the successful fine-tune @lingq1! > > There are a couple ways to determine effectiveness of your model. > > 1. **Benchmarks** - These are standardized datasets where...

@joecummings So did the application scenario I mentioned above not suit the use of distillation technology, or is there room for parameter optimization?

> IIUC correctly, your model performed better **after** distillation on your benchmarking than before? In that case, it seems that distillation worked? > > To make sure though that this...

> IIUC correctly, your model performed better **after** distillation on your benchmarking than before? In that case, it seems that distillation worked? > > To make sure though that this...

> Our LoRA recipe does not allow you to "LoRA-fy" the `gate_proj`, `up_proj`, or `down_proj` through the self attention modules b/c they are actually part of the MLP layer! You...

> Interesting - that loss is definitely a little high. At a glance, I'd recommend trying out the following things: > > 1. Reducing the kd_ratio like the advice [here](https://github.com/pytorch/torchtune/issues/2117#issuecomment-2542540431)....

> Your lr is also very small. Try 10x it to 1e-4 I'll try

> Interesting - that loss is definitely a little high. At a glance, I'd recommend trying out the following things: > > 1. Reducing the kd_ratio like the advice [here](https://github.com/pytorch/torchtune/issues/2117#issuecomment-2542540431)....