numcodecs icon indicating copy to clipboard operation
numcodecs copied to clipboard

FixedScaleOffset cannot be combined with BitRound to reduce quantization errors as expected

Open point9repeating opened this issue 7 months ago • 0 comments

import numpy as np
import numcodecs

filter = numcodecs.fixedscaleoffset.FixedScaleOffset(273.15, 1.0, 'f4', astype='f4')
# temp in K
temperature = np.array([263.05, 273.05, 273.35, 283.25, 293.55, 304.05, 313.94998], dtype=np.float32)
# scale temperature in degrees K to degrees C
filter.encode(temperature)
Out[110]: array([-10.,  -0.,   0.,  10.,  20.,  31.,  41.], dtype=float32)

Problem description

I'm looking at implementing lossy compression with the BitRound filter for some large weather datasets stored in zarr. Some parameters are stored with units that put all values in a range that can be fairly large in magnitude (e.g. not in the range of [2^0, 2^1]. One example is temperature in Kelvin. The quantization errors after applying BitRound are larger than they need to be in such cases.

If I could offset the data to a more reasonable range, I could achieve smaller quantization errors. It looked like FixedScaleOffset would be just the ticket after I saw that it accepts an astype argument. Unfortunately, FixedScaleOffset always rounds the data to integers before casting to that type. I tested a local implementation of FixedScaleOffset and found that removing this rounding achieved the desired behavior.

I would like to chain the FixedScaleOffset and Bitround filters in a way that could minimize quantization errors. In one local test, I used bit rounding with keepbits=8 for a temperature array. The maximum quanitzation errors were +/-0.5 degrees. Using FixedScaleOffset without integer rounding, these errors were reduced to +/-0.0625 degrees.

Potential enhancement

We could check that the astype argument is an integer dtype. If it is, we apply rounding. Otherwise, we leave the data alone.

Or, we could add an optional argument to FixedScaleOffset that controls whether or not rounding to integers is applied and default that to True for backwards compatibility.

point9repeating avatar Jul 12 '24 19:07 point9repeating