ncnn
ncnn copied to clipboard
feature plan
- purge onnx2ncnn (https://github.com/Tencent/ncnn/pull/6324)
- arch: layer support any packing (https://github.com/Tencent/ncnn/pull/6392)
- arch: optional packing in vulkan layer (https://github.com/Tencent/ncnn/pull/6389)
- feat: pnnx input npy files
- feat: vulkan pipeline cache
- opt: vulkan reduction optimization
maybe we can introduce weight onyl quant
目前支持 8-bit 量化了,可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238
64 个数为一组,一起进行量化,这64个数,共用一个 scale 和 bias。
目前支持 8-bit 量化了,可否支持 4-bit 量化? mlx 里的 4-bit 量化公式为 https://github.com/ml-explore/mlx/blob/3f730e77aa3d14e3d52688b8bd6a24bace500166/python/src/ops.cpp#L4238
64 个数为一组,一起进行量化,这64个数,共用一个 scale 和 bias。
- [ ] block quantization https://github.com/Tencent/ncnn/pull/6439
太好啦!很快就有 4-bit 量化啦!