constexpr potentianl improvements
I came a cross libdivide and I was curious how it compares with what compiler generates for constants. I made a few fixes and then added code for constexpr. With the constexpr compiler generates identical code when using compile time const directly, or, through libdivide.
I used this test code:
const unsigned freq = 3417614531;
const auto div_u = libdivide::libdivide_u64_gen(freq);
const auto div_ub = libdivide::libdivide_u64_branchfree_gen(freq);
const auto div_ux = libdivide::divider<uint64_t>(freq);
const auto div_ubx = libdivide::branchfree_divider<uint64_t>(freq);
unsigned toMicro(uint64_t ticks)
{
return static_cast<unsigned>(ticks * 1000000 / freq);
}
unsigned toMicro_u(uint64_t ticks)
{
return static_cast<unsigned>(libdivide::libdivide_u64_do(ticks * 1000000, &div_u));
}
// ... and other variants for div_ub , div_ux , div_ubx
all variants for toMicro_xxx generated different code. However, when I added constexpr stuff, all of them now produce identical result. Here the test code with current master, and the same test code with constexpr changes.
To minimize changes, I hijacked LIBDIVIDE_INLINE to define it as constexpr LIBDIVIDE_INLINE to make it work.
I had a couple of questions/observations along the way.
1)
In many places code goes like this:
static LIBDIVIDE_INLINE struct libdivide_u64_t libdivide_u64_gen(uint64_t d);
...
// later on (note, LIBDIVIDE_INLINE is missing):
struct libdivide_u64_t libdivide_u64_gen(uint64_t d) {
return libdivide_internal_u64_gen(d, 0);
}
imo, unlike static (which matters at the first declaration), inline has to go at definition site to matter.
2)
In my case, I had to do (uint64_t / uint32_t) -> uint32_t. I figured that I had to simply use uint64_t/uint64_t -> uint32_t. Could there be any optimizations for 64/32 case, or it has to be 64/64?