ncnn icon indicating copy to clipboard operation
ncnn copied to clipboard

feature plan

Open nihui opened this issue 5 months ago • 4 comments

  • purge onnx2ncnn (https://github.com/Tencent/ncnn/pull/6324)
  • arch: layer support any packing (https://github.com/Tencent/ncnn/pull/6392)
  • arch: optional packing in vulkan layer (https://github.com/Tencent/ncnn/pull/6389)
  • feat: pnnx input npy files
  • feat: vulkan pipeline cache
  • opt: vulkan reduction optimization

nihui avatar Sep 16 '25 02:09 nihui

maybe we can introduce weight onyl quant

futz12 avatar Sep 16 '25 11:09 futz12

目前支持 8-bit 量化了,可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238

64 个数为一组,一起进行量化,这64个数,共用一个 scale 和 bias。

csukuangfj avatar Sep 17 '25 02:09 csukuangfj

目前支持 8-bit 量化了,可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238

64 个数为一组,一起进行量化,这64个数,共用一个 scale 和 bias。

  • [ ] block quantization https://github.com/Tencent/ncnn/pull/6439

nihui avatar Dec 03 '25 09:12 nihui

太好啦!很快就有 4-bit 量化啦!

csukuangfj avatar Dec 03 '25 09:12 csukuangfj