PowerInfer
PowerInfer copied to clipboard
Fix offloading / VRAM budget bugs
After releasing online FFN offloading, we have found new issues in:
- [x] Decoding bug: #77.
- [x] Python module issue: #55, #78.
- [ ] Inaccuracy when offloading under a VRAM budget: #26, #38.
Some users also posted some errors per FFN offloading on social media that might need further investigate.
We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.
这个问题有解决吗?这边直接运行也看到gpu_offload未提前加载权重
第一步:报错没有activation文件夹;
这边手动增加activation文件夹(fake)后,执行python依然报错