PowerInfer Fix offloading / VRAM budget bugs

Fix offloading / VRAM budget bugs

Open hodlen opened this issue 1 year ago • 2 comments

After releasing online FFN offloading, we have found new issues in:

[x] Decoding bug: #77.
[x] Python module issue: #55, #78.
[ ] Inaccuracy when offloading under a VRAM budget: #26, #38.

Some users also posted some errors per FFN offloading on social media that might need further investigate.

Dec 26 '23 10:12 hodlen

We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.

Dec 29 '23 08:12 hodlen

这个问题有解决吗？这边直接运行也看到gpu_offload未提前加载权重第一步：报错没有activation文件夹；

这边手动增加activation文件夹（fake）后，执行python依然报错

Jun 19 '24 02:06 qw1319

PowerInfer PowerInfer copied to clipboard

Fix offloading / VRAM budget bugs

PowerInfer
PowerInfer copied to clipboard