Zikun Wu comments

Results 9 comments of


                                            Zikun Wu

How to OPT model with PowerInfer?

> Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the...

How to OPT model with PowerInfer?

@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9...

How to OPT model with PowerInfer?

> [@YixinSong-e](https://github.com/YixinSong-e) I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget...

请问我该如何获得opt模型相关的weight文件？

同问，readme没给出OPT的predictor？

Does it support other DeepSeek models?

@drunkcoding Dose it support python3.12?

[Feature Request]How to measure the generation throughput(token/s)?

@drunkcoding I am sorry to disturb you again. But in my case, llama.cpp is about 70% faster. The result below is just one case of the "tasksource/bigbench" and other outputs...

[Feature Request]How to measure the generation throughput(token/s)?

@drunkcoding I used the latest commit to build MoE-infinity in RTX 4080 Super(16GB). And the script I used to test MoE-infinity is `examples/interface_example.py`.To begin the test, I downloaded `DeepSeek-V2-Lite-Chat` from...

[Feature Request]How to measure the generation throughput(token/s)?

@drunkcoding It seem that ExpertPredictor and any other functions of expert cache haven't been called yet, because I change the code by adding `print("find_most_similar", time.time() - start_time)` here when build...

请问大神有支持LLama 3 70B 的计划吗？

PowerInfer只能用llama2吗，能不能扩展？？？？