MiniCPM
MiniCPM copied to clipboard
MiniCPM4: Ultra-Efficient LLMs on End Devices, achieving 5+ speedup on typical end-side chips
### Description / 描述 在使用powerinfer部署时,执行pip install -r requirements.txt时,显示安卓不能安装torch ### Case Explaination / 案例解释 _No response_
### Feature request / 功能建议 你好,不太清楚在哪里修改。。 full log: ``` Traceback (most recent call last): File "/root/mambaforge/envs/habitat-sim/lib/python3.9/site-packages/urllib3/connection.py", line 198, in _new_conn sock = connection.create_connection( File "/root/mambaforge/envs/habitat-sim/lib/python3.9/site-packages/urllib3/util/connection.py", line 60, in create_connection for...
### Description / 描述 i want to use model(data) to inference model instead of model.chat(input) to inference. how can i prepared the data?? ### Case Explaination / 案例解释 i can...
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
您好,我在使用该框架测试gsm8k时遇到了复现不一致的问题。 用minicpm-2b-sft-bf16模型在gsm8k任务上测的值只有38.13, ``` "overall_result": { "accuracy": 0.3813495072024261 } ``` 使用的配置参数如下: ``` { "task_name": "gsm8k_gsm8k_gen", "path": "datasets/gsm8k/data/gsm8k.jsonl", "description": "", "transform": "datasets/gsm8k/transform_gen_v0.py", "fewshot": 8, "batch_size": 1, "generate": { "method": "generate", "params": "models/model_params/vllm_sample_v1.json",...
### Feature request / 功能建议 [MiniCPM](https://github.com/OpenBMB/MiniCPM)真的是好用的感动的我要哭!!强烈请求尽快出Ollama模型,最好是能在ollama官网上上线,兼容最新版ollama。先谢过了哪位大神了~
测试了一下在V100上速度较慢,这是为何
### Feature request / 功能建议 请问什么框架可以提供mini-embedding和mini-reranker的推理服务,transformers推理好像不能直接支持异步
### Is there an existing issue ? / 是否已有相关的 issue ? - [X] I have searched, and there is no existing issue. / 我已经搜索过了,没有相关的 issue。 ### Describe the bug /...
### Feature request / 功能建议 我用vllm进行部署,命令如下 ```bash vllm serve /hestia/model/MiniCPM3-4B --trust-remote-code --max-model-len 12288 --num-gpu-blocks-override 768 --port 8001 --max-num-seqs 32 --served-model-name minicpm --swap-space 0 ``` 12288的上下文长度就消耗了22G的显存,我看readme里提到了LLM x MapReduce可以低显存处理无限上下文,请问要如何开启