FixedPointNumbers.jl
FixedPointNumbers.jl copied to clipboard
Fix misleading statement regarding sign bit
The original formulation of the README made me think that the bit pattern of Fixed{IntN,f}
numbers is [sign bit] [Int(N-1) number]
. I hope this alternative formulation makes it clearer that Fixed{T,f}
numbers are based on 2's complement just like the underlying integer type T
.
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 93.08%. Comparing base (
51b1838
) to head (e571dd4
). Report is 23 commits behind head on master.
Additional details and impacted files
@@ Coverage Diff @@
## master #260 +/- ##
=======================================
Coverage 93.08% 93.08%
=======================================
Files 6 6
Lines 767 767
=======================================
Hits 714 714
Misses 53 53
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I know this is a matter of taste, but I think the changed description focuses too much on the internal implementation.
In extreme terms, whether or not 2's complement is used is irrelevant to the interface of Fixed
(except for reinterpret
).
Also, the "decimal" point may be introducing new confusion.
Perhaps what we need is more graphical documentation (with Documenter.jl).
The main problem I see with the current description is that to me, "1 sign bit, f
fraction bits" suggests a representation of the form sign * unsigned_integer / 2^f
. So in particular, based on that description I would assume that the smallest (most negative) Fixed{IntN,f}
is -(2^(N-1)+1) / 2^f
, that there is a positive and negative zero like for floating-point numbers, and I wouldn't expect the overflow behavior that Fixed{T,f}
inherits from the underlying integer type.
I believe I understand your thinking, and I think the specific example work well enough to eliminate such ambiguity.
-1.0 = -128/128 ≤ x ≤ 127/128 ≈ 0.992.
As far as the section is concerned, it is reasonable not to refer to the msb as "sign bit".
However, we have difficulty rationalizing the naming rule for aliases such as Q0f7
without using "sign bit".
I simply wanted to maintain this PR since it has been left open. So, I will leave this open.
The "1 sign bit" also mislead me. It does not need to get as technical as proposed here, but I think the current wording is plain wrong.
I think the specific example work well enough to eliminate such ambiguity. This example didn't help me, although I now see in retrospect how it could have helped.
What helped me, was this example
Fixed{Int8,1}(0) - eps(Fixed{Int8,1}) |> reinterpret |> (x -> reinterpret(UInt8, x))
returning 0xff
.