tract icon indicating copy to clipboard operation
tract copied to clipboard

DepthWise conv Inner loop f16 support

Open VariantXYZ opened this issue 2 years ago • 27 comments

https://github.com/Rikorose/DeepFilterNet/pull/211#issuecomment-1353637586

Digging in a bit into why I was seeing so many f32/f16 conversions despite the A55 supporting fp16 storage and arithmetic, it seems like this is just a limitation of Rust’s f16 support.

To fully take advantage of FP16, I think avoiding these conversions is necessary… though, I’m not sure what the best solution is…

Maybe just rewriting the inner loop in assembly for f16 when the CPU says it supports f16?

Overriding the operators in the half crate might work too.

VariantXYZ avatar Dec 15 '22 20:12 VariantXYZ