libnds
libnds copied to clipboard
Hardware accelerated floating point operations
Related to #57 . Please let me know if you need tests of these functions or things like NaN, subnormal number, or overflow handling. These also don't include rounding. Ideally these should end up as ARM code. You need
-Wl,--use-blx,--wrap="__aeabi_fmul",--wrap="__aeabi_fdiv",--wrap="sqrtf" -u sqrtf -u __aeabi_fmul -u __aeabi_fdiv
in the linker flags if you want the compiler to replace these automatically. You may also need a declaration and/or adding their header file.
The fmul function can technically be shorter by 1 instruction if you do the mantissa multiplication differently, but I looked at the assembly generated and it'd add an interlock after the long multiply if you use the result in the following clz instruction directly so I dont think it'd help to change it. I can also add a usage example for nds-examples if needed. A common use case of floating point in general might be evaluation of higher order polynomials, that's the primary reason I added a mul implementation.
If there are any other issues please let me know.
I probably have to do some detailed benchmarking, it might be the case that the division and multiplication functions don't give good enough performance, the previous benchmark I had posted was using timers wrong.
I did some accuracy testing and by adding rounding the sqrtf is 100% identical to the existing sqrtf from the math library, so at least that could be considered, but I'm way less certain about the div and mul functions.
The floating point emulation routines come from libgcc and are expected to conform to IEEE floating point specifications, since GCC relies on compliant behavior to produce correct code. Overriding piecemeal routines in libnds with potentially inaccurate approximations is not appropiate, sorry. Moreover, usage of the hardware divider introduces concurrency issues. The comment above was deleted due to references to an ongoing disruptive hostile fork of devkitPro's DS toolchain efforts.