highway icon indicating copy to clipboard operation
highway copied to clipboard

OrderedDemote2To() f64->f32 ?

Open Pflugshaupt opened this issue 1 year ago • 3 comments

I'm migrating my DSP codebase from my own attempt of a library to Highway at the moment. Things went mostly well but I found one thing a bit puzzling: I have some algorithms that work on float lanes, but have to do a intermediate calculations at double precision. My own library allowed having double-as-wide f64 aggregates for that, but I see that highway won't do Twice<d> on full-width tags. That's fair enough and so I went with PromoteLowerTo() and PromoteUpperTo() to convert each float tag to two double tags.. However to go back to float later I found OrderedDemote2To() is curiously missing for double to float. Is there a specific reason for that or am I missing some other function? I just want to convert N double lanes to N float lanes using half as many registers - it seems like something that would come up quite often with algorithm requiring full float precision results.

I ended up writing this, but it seems a bit silly:

        auto dbl2float = [](auto d, auto a, auto b) HWY_ATTR {
            const Half<decltype(d)> hd;
            return Combine(d, DemoteTo(hd, b), DemoteTo(hd, a));
        };

Pflugshaupt avatar Dec 15 '23 21:12 Pflugshaupt

Hi, we don't have f64->f32 OrderedDemote2To because x86 and SVE can't do that very efficiently and we did not yet have a use-case.

However, RVV and NEON could do this a bit more efficiently. Would you be interested in having a go at adding support? That would involve updating quick_reference.md to mention f64->f32 is supported, in demote_test.cc:678 adding ForShrinkableVectors<TestFloatOrderedDemote2To>()(float());, copying your implementation to generic_ops-inl.h with the usual #if (defined(HWY_NATIVE_ 'include guard', and adding implementations to rvv-inl.h and arm_neon-inl.h.

jan-wassenberg avatar Dec 18 '23 10:12 jan-wassenberg

Ok, I'll give it a try once I'm done migrating to Highway and gained some more experience with it. That'll be in January. Thanks for letting me know I'm not missing a different way to do f64->f32. An issue might be that I have zero experience with Risc-V/RVV.

Pflugshaupt avatar Dec 18 '23 15:12 Pflugshaupt

Sounds good :) No worries, RVV already has an existing function for that, it may be enough simply to enable f64->f32 in the template SFINAE. Would also be fine to write a TODO instead, in the meantime that target would be covered by the generic code.

jan-wassenberg avatar Dec 18 '23 16:12 jan-wassenberg