tract
tract copied to clipboard
AVX512F optimized matrix multiplication
Hey @tgolsson!
The AVX512F question is back on the radar with another team evaluating performance on this architecture. IIRC, you have already done the heavy lifting on this front, with decent results. Would it be possible to put these kernels back in play? I can help with finalizing integration myself, but would hate to duplicate what you've already done...
Thanks a lot.
Hey! Yeah I'd done a bunch of work, some kernels... I'll try to clean it up and push. It's not in a shippable state but will happily contribute what I've done. :) I'll either do it later today or most likely during the weekend/Monday.