bcolz
bcolz copied to clipboard
Setting quantize default is applied automatically to non-float columns
Hi,
i'm fiddling around a bit with chunklen, compression etc. As we have a lot of non-homogenous ctables (mixes of int64 and float64), some of the ctable functionality runs into issues. So if you do this:
bcolz.cparams.setdefaults(cname='zstd', clevel=5, shuffle=2, quantize=16)
you get the following error with a non-homogenous ctabel creation using a fromiter:
/srv/python/venv/local/lib/python2.7/site-packages/bcolz/toplevel.pyc in fromiter(iterable, dtype, count, **kwargs)
207 # Iterable has been exhausted
208 break
--> 209 obj.append(chunk)
210 obj.flush()
211 return obj
/srv/python/venv/local/lib/python2.7/site-packages/bcolz/ctable.pyc in append(self, cols)
427 column = cols[name]
428 # Append the values to column
--> 429 self.cols[name].append(column)
430 if sclist and not hasattr(column, '__len__'):
431 clen2 = 1
bcolz/carray_ext.pyx in bcolz.carray_ext.carray.append (bcolz/carray_ext.c:21637)()
bcolz/carray_ext.pyx in bcolz.carray_ext.chunk.__cinit__ (bcolz/carray_ext.c:5278)()
bcolz/carray_ext.pyx in bcolz.carray_ext.chunk.compress_arrdata (bcolz/carray_ext.c:5924)()
/srv/python/venv/local/lib/python2.7/site-packages/bcolz/utils.pyc in quantize(data, significant_digits)
178
179 if data.dtype.kind != 'f':
--> 180 raise TypeError("quantize is meant only for floating point data")
181
182 if not significant_digits:
TypeError: quantize is meant only for floating point data
I will try to be helpful and see if i can see to make this smarter with a PR that will ignoring quantization for non-float columns. Also might create a PR to have a more complex chunklen for ctables (per column), but that's not the issue here
Yeah, a PR on that would be great. Thanks in advance!