elePyant icon indicating copy to clipboard operation
elePyant copied to clipboard

Round in binary not in decimal representation

Open milankl opened this issue 4 years ago • 15 comments

Be aware that rounding in decimal representation does is general not round in binary representation too, as the former is not an integer power of the latter. Currently, you seem to use np.around(x,decimals=d), but this does not yield zeroed trailing bits.

import struct 
def binary(num): 
    return ''.join('{:0>8b}'.format(c) for c in struct.pack('!f', num))

In [12]: binary(np.around(np.pi,decimals=2))                                    
Out[12]: '01000000010010001111010111000011'

In [13]: binary(np.around(np.pi,decimals=3))                                    
Out[13]: '01000000010010010001011010000111'

In [14]: binary(np.around(np.pi,decimals=4))                                    
Out[14]: '01000000010010010000111111111001'

Note that you can convert significant digits (i.e. decimal) to significant bits via

"""Number of significant bits `nsb` given the number of significant digits `nsd`."""
nsb(nsd::Integer) = Integer(ceil(log(10)/log(2)*nsd))

which is 7 significant bits to guarantee 2 significant (decimal) digits, 10 for 3, 14 for 3, etc. So what you want is a rounding function that does this:

julia> bitstring(round(Float32(π),sigdigits=7,base=2))
"01000000010010100000000000000000"

julia> bitstring(round(Float32(π),sigdigits=10,base=2))
"01000000010010010000000000000000"

julia> bitstring(round(Float32(π),sigdigits=14,base=2))
"01000000010010010001000000000000"

Question is obviously why the compression still works well? I assume this is because if you round two numbers x,y that are fairly close to each other, to the same decimal places, you get the same bitpatterns in the significant:

In [15]: binary(np.around(np.pi,decimals=2))                                    
Out[15]: '01000000010010001111010111000011'

In [16]: binary(np.around(np.pi*1.001,decimals=2))                              
Out[16]: '01000000010010001111010111000011'

Which is presumably also somewhat compressable. But I guess 0's everywhere are more compressible than these seemingly random, but repeating bitpatterns.

milankl avatar Jul 14 '20 22:07 milankl

Technically, a proper round-to-nearest (tie-to-even) in binary only requires a few bitwise operations, you can find an example here. In case you either want to implement it yourself, or two compare it if you find a corresponding python function.

milankl avatar Jul 14 '20 22:07 milankl

As a work-around you may do something like this

def binary_round(x,nsb): 
    return round(x*2**nsb)/2**nsb 

In [57]: binary(binary_round(np.pi,7))                                          
Out[57]: '01000000010010010000000000000000'

In [58]: binary_round(np.pi,7)                                                  
Out[58]: 3.140625

But I doubt it works in all cases. EDIT: Yes, nsb would need to scale down by a factor of 2 for every power of 2 x is larger than 2... something like that.

milankl avatar Jul 14 '20 22:07 milankl

Interesting - will have a look at implementing this in a future version! Sounds like it could make the compression a lot more efficient.

fraserwg avatar Jul 15 '20 09:07 fraserwg

Also note the pynco project. NCO has a similar lossy+lossless compression strategy, that is explained here

milankl avatar Jul 17 '20 13:07 milankl

I added a binary rounder (following @milankl) and a significant digit in base-10 rounder (see here in my branch). I did some testing of the impacts on size/fidelity (notebook). Interesting stuff.

@milankl I was wondering if you can explain why NumPy gives the error

Changing the dtype of a 0d array is only supported if the itemsize is unchanged

for NSD≥7 when trying to calculate the shavemask (this line).

Edit: So I found that it happens actually going from NSB 20->21, because that is where np.array shifts from assigning dtype 'int64' to assigning 'int32', since 52-21=31 so the int now fits in 32-bit.

zmoon avatar Aug 10 '20 20:08 zmoon

@zmoon92 Great job, translating my Julia functions to python. Unfortunately, I don't know python well enough to understand how it handles bittypes. In my experience there is a lot of implicit stuff happening, which is really annoying to debug (that's why I use Julia for my normal workflow).

Why do you need to convert to an np.array here? Note that in the end you just need 1s for all bits you want to keep, 0s for the others, so in order to avoid the 2**-operation, you can also do:

julia> ~UInt64(2^(52-10)-1)                  # old, requires ~,^,-,-
0xfffffc0000000000

julia> 0xffff_ffff_ffff_ffff << (52-10)      # requires only <<,-
0xfffffc0000000000

Which is probably advisable in any case...

milankl avatar Aug 15 '20 10:08 milankl

Why do you need to convert to an np.array here?

For the shavemask lines I used np.array just to be able to get an unsigned integer (in order to follow yours exactly), which standard Python doesn't have. But I guess could use np.uint64/32 instead like below. I was able to to fix the error I was getting this way, explicitly specifying the type. Weird things happen when I try the other form you suggest in standard Python I guess because the standard ints are not fixed bit length.

>>> ~np.uint64(2**(52-10)-1)
18446739675663040512
>>> bin(~np.uint64(2**(52-10)-1))
'0b1111111111111111111111000000000000000000000000000000000000000000'
>>> 0xfffffc0000000000 ==  ~np.uint64(2**(52-10)-1)
True
>>> 
>>> 0xffff_ffff_ffff_ffff << (52-10)
81129638414606681691390958632960
>>> bin(0xffff_ffff_ffff_ffff << (52-10))
'0b1111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000'
>>> hex(0xffff_ffff_ffff_ffff << (52-10))
'0x3fffffffffffffffc0000000000'
>>> (0xffff_ffff_ffff_ffff << (52-10)).bit_length()
106

However, it seems that with NumPy we can get around this.

>>> 0xffff_ffff_ffff_ffff << np.uint64(52-10)
18446739675663040512
>>> hex(0xffff_ffff_ffff_ffff << np.uint64(52-10))
'0xfffffc0000000000'

Great job, translating my Julia functions to python. ... In my experience there is a lot of implicit stuff happening, which is really annoying to debug (that's why I use Julia for my normal workflow).

Thanks! I am interested in Julia but haven't worked it into my workflows much yet... Any suggestions?

zmoon avatar Aug 15 '20 15:08 zmoon

>>> bin(0xffff_ffff_ffff_ffff << (52-10))
'0b1111111111111111111111111111111111111111111111111111111111111111000000000000000000000000000000000000000000'
>>> (0xffff_ffff_ffff_ffff << (52-10)).bit_length()
106

It seems that for <<(::UInt,::Int) python does not just push off the 1s across the edge, but actually extends the bitstream by how many bits are needed (which I find a weird behaviour but okay...). That's why 64+42=106. Yeah, in Julia <<,>> is always defined as padding with zeros and bits are simply pushed off the edge. I assume that once you use np.uint numpy calls the underlying C functions which behave like Julia (or rather Julia behaves like C for these bitwise operations, in most cases you can just copy&paste C code ;))

Thanks! I am interested in Julia but haven't worked it into my workflows much yet... Any suggestions?

Check out https://julialang.org/learning/

milankl avatar Aug 15 '20 15:08 milankl

Once this works, could you benchmark your binary round function? I would be interested how quickly python can process large arrays.

milankl avatar Aug 15 '20 15:08 milankl

Here is one benchmark example running on one thread with an array that is big but fits fine in RAM.

zmoon avatar Aug 20 '20 19:08 zmoon

Thanks Zachary, I believe there is this optimization potential: Binary rounding should not allocate more memory than the output array. On my macbook air

julia> using BenchmarkTools, Elefridge
julia> x = rand(1000, 1000, 100)*400;

julia> @btime x.^2;
  311.556 ms (6 allocations: 762.94 MiB)

julia> @btime round(x,7);
  311.105 ms (2 allocations: 762.94 MiB)

milankl avatar Aug 21 '20 09:08 milankl

Ok, thanks @milankl, I think I fixed this in my binary rounder by changing the return. According to line-by-line memory profiling with %mprun the memory increment is now almost 0 for inplace=True. And it does increase the speed somewhat as well.

I updated my profiling notebook with some additional speed tests.

zmoon avatar Aug 21 '20 19:08 zmoon

Question: Your binary round function does not get compiled, right? Because that might explain the remaining x5 speed difference...

milankl avatar Aug 22 '20 12:08 milankl

Yeah, although much of NumPy is written in C and compiled ahead of time, my function using NumPy's array and operations doesn't get the same treatment. I tried Numba and got a bit of a boost, though I had to change a few things in the code; plan to also try Cython as well just to see.

zmoon avatar Aug 22 '20 16:08 zmoon

Yeah, you have to decide whether it's worth it. But as you can see, a binary rounding function does not have to be slower than squaring every element for example

milankl avatar Aug 23 '20 09:08 milankl