MiniCPM
MiniCPM copied to clipboard
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
您好, 请问如果想在你们模型的基础上继续在某个领域的数据(大概20B)上继续预训练,是不是可以在你们退火后的学习率1e-3的基础上使用WSD的学习策略?是将1e-3设置为最大学习率,然后运用你们的公式根据步数调整LR吗?
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
```python def __init__( self, data_path, tokenizer, model_max_length=4096, user_tokens=[1786, 4194, 95388], assistant_tokens=[1786, 10850, 95388], ): ``` We try to do sft on `MiniCPM-1B-sft-bp16`, but there is an error, because the token...
用Linux transformers载入推理的,测试了常规对话,在跑的时候没有看到支持 Function Call 等功能
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
### Feature request / 功能建议 Seems only the text part is supported on llama.cpp, but the multimodal part is not. It would be really helpful if it could be supported,...
想问一下 Ultrachat/sharegpt 放入预训练的数据拼接方式是怎样的呢?
### Feature request / 功能建议 Thanks for your great work. Can you provide more details on the proxy model? I have several questions. 1. Depth scale on the attention/mlp sub-block,...