l15y

Results 5 issues of l15y

enhancement

只做了4位的,8位同理。model.py中,相应函数改为以下内容。首次运行,需将firset_run改为1。 可在config中加入开关并自动检测保存状态。 ``` def prepare_model(): import pickle from transformers import AutoModel global model if cmd_opts.precision == "int4": firset_run=0 if firset_run: model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True) model = model.half().quantize(4) print("量化完毕") with open(cmd_opts.model_path+"int4",...

### Feature request 需要stream 生成方式支持 ### Motivation 需要stream 生成方式支持 ### Your contribution [wenda](https://github.com/wenda-LLM/wenda)现已支持aquila下多轮对话以及知识库、auto等功能。如果加入stream 生成方式,将在第一时间跟进

### Feature Description Current llama.cpp implementation doesn't optimally utilize NUMA architecture when running Mixture-of-Experts (MoE) models, potentially leaving significant performance gains untapped. ### Proposed Solution Implement NUMA-aware expert allocation through...

enhancement
stale