ncnn feature plan

purge onnx2ncnn (https://github.com/Tencent/ncnn/pull/6324)
arch: layer support any packing (https://github.com/Tencent/ncnn/pull/6392)
arch: optional packing in vulkan layer (https://github.com/Tencent/ncnn/pull/6389)
feat: pnnx input npy files
feat: vulkan pipeline cache
opt: vulkan reduction optimization

Sep 16 '25 02:09 nihui

maybe we can introduce weight onyl quant

Sep 16 '25 11:09 futz12

目前支持 8-bit 量化了，可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238

64 个数为一组，一起进行量化，这64个数，共用一个 scale 和 bias。

Sep 17 '25 02:09 csukuangfj

目前支持 8-bit 量化了，可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238

64 个数为一组，一起进行量化，这64个数，共用一个 scale 和 bias。

[ ] block quantization https://github.com/Tencent/ncnn/pull/6439

Dec 03 '25 09:12 nihui

太好啦！很快就有 4-bit 量化啦！

Dec 03 '25 09:12 csukuangfj