Zikun Wu
Zikun Wu
> Due to limited bandwidth, this part of the model support hasn't been merged to main branch yet. We plan to release this part recently, and we will release the...
@YixinSong-e I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget 6.9...
> [@YixinSong-e](https://github.com/YixinSong-e) I found this problem when running opt-6b7 in 4080S GPU.And the command is ./build/bin/main -m /share-data/wzk-1/model/powerinfer/opt-6.7b.powerinfer.gguf -n 32 -t 8 -p "Paris is the capital city of" --vram-budget...
同问,readme没给出OPT的predictor?
@drunkcoding Dose it support python3.12?
@drunkcoding I am sorry to disturb you again. But in my case, llama.cpp is about 70% faster. The result below is just one case of the "tasksource/bigbench" and other outputs...
@drunkcoding I used the latest commit to build MoE-infinity in RTX 4080 Super(16GB). And the script I used to test MoE-infinity is `examples/interface_example.py`.To begin the test, I downloaded `DeepSeek-V2-Lite-Chat` from...
@drunkcoding It seem that ExpertPredictor and any other functions of expert cache haven't been called yet, because I change the code by adding `print("find_most_similar", time.time() - start_time)` here when build...
PowerInfer只能用llama2吗,能不能扩展????