bcolz icon indicating copy to clipboard operation
bcolz copied to clipboard

ctable fromiter can't handle lists

Open ckingdev opened this issue 7 years ago • 0 comments

This is altogether a pretty minor issue, and it's easy to work around, but I didn't see any documentation about it and I only happened upon the workaround accidentally.

When creating a ctable using bcolz.fromiter, if the iterable has the individual elements as lists rather than tuples, an exception is thrown when calling np.fromiter.

An example:

import bcolz

sq1 = [(0, 2), (1, 4), (2, 6)]
sq2 = [[0, 2], [1, 4], [2, 6]]

# ok
ct1 = bcolz.fromiter(sq1, dtype="i8,i8", count=3)

# ValueError: setting an array element with a sequence
ct2 = bcolz.fromiter(sq2, dtype="i8,i8", count=3)

The full traceback:

ValueError                                Traceback (most recent call last)
<ipython-input-12-7a50c10b0055> in <module>()
----> 1 ct = bcolz.fromiter(sq2, dtype="i8,i8", count=3)

~/.pyenv/versions/3.6.4/lib/python3.6/site-packages/bcolz/toplevel.py in fromiter(iterable, dtype, count, **kwargs)
    203     # Then fill it
    204     while True:
--> 205         chunk = np.fromiter(it.islice(iterable, chunklen), dtype=dtype)
    206         if len(chunk) == 0:
    207             # Iterable has been exhausted

ValueError: setting an array element with a sequence.

The obvious solution is to convert the iterable to an iterable of tuples, but that was not immediately obvious as the issue. If it's not possible to work around this in the implementation or throw a more informative exception (maybe check the first element of the iterable for a list?) adding a note in the docs would be helpful, I think.

Thanks! I've been really impressed with this library, I currently have in RAM a 62 GB table taking up just 2.6 GB. Fantastic work.

ckingdev avatar Mar 06 '18 05:03 ckingdev