PowerInfer icon indicating copy to clipboard operation
PowerInfer copied to clipboard

Fix offloading / VRAM budget bugs

Open hodlen opened this issue 1 year ago • 2 comments

After releasing online FFN offloading, we have found new issues in:

  • [x] Decoding bug: #77.
  • [x] Python module issue: #55, #78.
  • [ ] Inaccuracy when offloading under a VRAM budget: #26, #38.

Some users also posted some errors per FFN offloading on social media that might need further investigate.

hodlen avatar Dec 26 '23 10:12 hodlen

We should also consider VRAM overhead under different batch processing sizes. When batch size grows, it is likely to encounter CUDA OOM during the prompt phase.

hodlen avatar Dec 29 '23 08:12 hodlen

这个问题有解决吗?这边直接运行也看到gpu_offload未提前加载权重 第一步:报错没有activation文件夹; image

这边手动增加activation文件夹(fake)后,执行python依然报错 image

qw1319 avatar Jun 19 '24 02:06 qw1319