num-integer
num-integer copied to clipboard
Implement Power10 trait, with first method is_power_of_ten
Here is the first method in the new trait. Just sending it across for a quick review before I start adding more.
Do you prefer
- one pull request, one commit, all methods
- one pull request, one commit per method
- one pull request per method?
I think I need help with fixing the u128 case. I can't seem to hide the static array from the compiler with a cfg
attribute.
I don't understand why it works with 1.20, which doesn't have u128, but not older...
Looks like I managed to hide u128 :smirk: If there is a better way, let me know.
FYI, this is making things a bit difficult.
I prefer one PR with all planned methods, so we can treat it as ready-to-publish once merged, without further breaking changes. Adding more methods on a trait is breaking unless there's a default impl.
I don't care how many commits are in your PR, though I do prefer a clean working progression over seeing useless "fix it" kind of commits.
I don't understand why it works with 1.20, which doesn't have u128, but not older...
I believe the reason is that 1.20 has unstable support for u128, so it at least knows how to parse the big literals, where earlier versions didn't. Hiding it in a macro allowed those earlier versions to discard it without trying to tokenize the contents. An alternative workaround would be to use a const expression of smaller literals, like 10000000000 * 10000000000
.
Hmm, actually it looks like I haven't done a crate-wide fmt
here before, so that would be a lot of noise in this PR. I need to get to that later...
So I need a little help/sounding board here. After looking more closely at the performance, I have two different 64-bit implementations (perfect_hash, lz) that behave like so:
RUSTFLAGS="" perfect_hash: 9000 ns/iter. lz: 7300 ns/iter.
RUSTFLAGS="-C target-cpu=native" perfect_hash: 4100 ns/iter lz: 5000 ns/iter.
I think perfect_hash vectorizes better. I can't make up my mind about which one is the right one for a general purpose library :frowning_face: help?
Those numbers look close enough that I'd lean toward whichever will be easier to maintain.
The vectorization of your microbenchmark may or may not represent how well it can be optimized in real use, where you're more likely to have other computations involved too. So try not to extrapolate too much from these results.
I agree that vectorization is not an important consideration. When I turn off vectorization, lz benefits from native, but perfect_hash does not. I was also worried about 32bit platforms, but a quick check shows that lz with target-cpu=native is better, but worse without native at least on i686.
On balance, I think I'll go with lz for u64. Gotta run the numbers for u32.
Thanks.
This should be a good basis for a full review. I did a cargo fmt
just on the new files. The arrays became quite long, but I left them as fmt wanted. I'll squash again after review edits.
A few things:
- There is a bit of a naming issue right now:
Ten vs 10. Power10, floor_log10 vs. next_power_of_ten, is_power_of_ten
- I modeled the
wrapping/checked/next_power_of_ten
the same way as std, butfloor_log10
just returns 0 for 0 (no alternate methods). - All methods take
&self
. That makes a lot of sense for a trait that may be used for things other than primitives. std usesself
for similar methods, which I feel is problematic for a Power2 trait down the road. I don't intend to implement Power2.
@cuviper ping, just in case this got lost in the holiday shuffle :smile:
Hi! This is still in my inbox to review, but I'll at least thrown opinions at your design questions. :)
- There is a bit of a naming issue right now:
Ten vs 10. Power10, floor_log10 vs. next_power_of_ten, is_power_of_ten
I think the method names are OK. The floats give a precedent for log10
, and integer power_of_two
gives a precedent for _ten
, so we're on par with std
inconsistency here. I could accept either way on the trait name, Power10
or PowerTen
.
- I modeled the
wrapping/checked/next_power_of_ten
the same way as std, butfloor_log10
just returns 0 for 0 (no alternate methods).
I would expect 0 to be a panic case, because log(0)
in any base is undefined.
You can also consider that log(1)
is exactly 0, and log
is strictly increasing, so floor_log(1)
should be the least input which returns 0 -- leaving us with no answer for floor_log(0)
.
Aside on naming - I'm inclined to just call this log10
instead of floor_log10
. I feel the flooring aspect is implicit when we're talking about unsigned integers, just like division.
- All methods take
&self
. That makes a lot of sense for a trait that may be used for things other than primitives. std usesself
for similar methods, which I feel is problematic for a Power2 trait down the road. I don't intend to implement Power2.
&self
is appropriate, and I don't think it's a problem for Power2. Everything we would implement through std
methods are Copy
types, so they can just dereference to get self
by value.
Renamed floor_log10 to log10 and added checked/unchecked variants (wrapping didn't make sense?). Also made sure tests pass in release mode.
BTW, interesting curiosity: 0f64.log10().floor() == 0
(would've made a good excuse for the method name and zero behavior :sweat_smile: )
BTW, interesting curiosity:
0f64.log10().floor() == 0
I get -inf
: playground
And even that only makes from the positive side, lim{x->0+} log(x)
Right, sorry, this got me confused:
0f64.log10().floor() as u32 == 0
playground
It seems we ought to rather have a general Power
trait with a method is_power_of(self, other) -> bool
which checks whether self
is a power of other
.
I suggested generic powers in #14 too, but I think it's still OK to specialize common bases.
The last commit should address the latest review comments. Let me know if there is more to do (I'll squash the commits once the review is done).
Oh, I should clarify one thing: the f64 stuff was mostly a joke. It's so hard to get that through in writing (sigh... :sweat:)
The benchmarks are failing on nightly:
failures:
---- benchl10_u32_up_to_10000 stdout ----
thread 'benchl10_u32_up_to_10000' panicked at 'undefined value for log of zero', src/power10.rs:763:1
note: Run with `RUST_BACKTRACE=1` for a backtrace.
---- benchl10_u64_up_to_10000 stdout ----
thread 'benchl10_u64_up_to_10000' panicked at 'undefined value for log of zero', src/power10.rs:769:1
It was the zero panic. Pushed a fix. BTW, the always panic behavior has a 50% cost in performance for u32 (from 0.8 ns per call to 1.2 ns per call).
BTW, the always panic behavior has a 50% cost in performance for u32
Well, microbenchmarks are hard to extrapolate into real use cases, mixed with other computations. For instance, if the optimizer has enough range information from context to know the input can't be zero, then the check will probably be removed altogether.