l15y issues

Results 5 issues of


                                            l15y

只做了4位的，8位同理。model.py中，相应函数改为以下内容。首次运行，需将firset_run改为1。可在config中加入开关并自动检测保存状态。 ``` def prepare_model(): import pickle from transformers import AutoModel global model if cmd_opts.precision == "int4": firset_run=0 if firset_run: model = AutoModel.from_pretrained(cmd_opts.model_path, trust_remote_code=True) model = model.half().quantize(4) print("量化完毕") with open(cmd_opts.model_path+"int4",...

将webui对话以聊天框的样式显示

闻达已支持 aquila，为更好的用户体验，需要stream 生成方式

### Feature request 需要stream 生成方式支持 ### Motivation 需要stream 生成方式支持 ### Your contribution [wenda](https://github.com/wenda-LLM/wenda)现已支持aquila下多轮对话以及知识库、auto等功能。如果加入stream 生成方式，将在第一时间跟进

Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc

### Feature Description Current llama.cpp implementation doesn't optimally utilize NUMA architecture when running Mixture-of-Experts (MoE) models, potentially leaving significant performance gains untapped. ### Proposed Solution Implement NUMA-aware expert allocation through...

enhancement

stale

l15y

apu support?

实现了保存已量化模型，大幅加快启动速度，望合并

将webui对话以聊天框的样式显示

闻达已支持 aquila，为更好的用户体验，需要stream 生成方式

Feature Request: NUMA-aware MoE Expert Allocation for Improved Performanc