efficient bcolz.ctable itersequence/read_coordinates methods
It looks like bcolz.ctable[list_of_indices] is very slow:
Python 3.5.1 |Anaconda custom (64-bit)| (default, Dec 7 2015, 11:16:01)
In [1]: %paste
In [1]: import numpy as np
In [2]: import bcolz
In [3]: from collections import namedtuple
In [4]: from itertools import islice
In [5]: bcolz.print_versions()
## -- End pasted text --
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version: 1.1.1.dev16
NumPy version: 1.11.1
Blosc version: 1.9.3 ($Date:: 2016-07-06 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version: 2.6.1
Dask version: 0.11.0
Python version: 3.5.1 |Anaconda custom (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Platform: linux-x86_64
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
In [2]: %paste
In [6]: N = int(1e6)
In [7]: ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N, rootdir='bz_test', mode='w')
In [8]: from random import randint, sample
In [9]: list_of_indx = sample(range(1000000), k=100000)
In [10]: %time ct[list_of_indx]
## -- End pasted text --
CPU times: user 8.56 s, sys: 1.44 s, total: 10 s
Wall time: 10 s
Out[2]:
array([(807054, 651336158916.0), (882754, 779254624516.0),
(918705, 844018877025.0), ..., (799379, 639006785641.0),
(620003, 384403720009.0), (42781, 1830213961.0)],
dtype=[('f0', '<i4'), ('f1', '<f8')])
I wonder if it's possible for bcolz to have itersequence/read_coordinates methods that are as efficient as they are in pytables?
Thank you, Anton
Well, everything can be made faster, but how much faster is PyTables in this situation?
right, here is a benchmark against pytables.
In [1]: import tables
In [2]: tables.__version__
Out[2]: '3.2.4.dev0'
In [3]: import bcolz
In [4]: bcolz.print_versions()
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
bcolz version: 1.1.1.dev18+dirty
NumPy version: 1.11.1
Blosc version: 1.9.3 ($Date:: 2016-07-06 #$)
Blosc compressors: ['blosclz', 'lz4', 'lz4hc', 'snappy', 'zlib']
Numexpr version: 2.6.1
Dask version: 0.11.0
Python version: 3.5.1 |Anaconda custom (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
Platform: linux-x86_64
Byte-ordering: little
Detected cores: 4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
In [5]: N = int(1e6)
In [6]: ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N, rootdir='bz_test', mode='w')
In [7]: from random import sample
In [8]: indx = sample(range(1000000), k=100000)
In [9]: %time ct[indx]
CPU times: user 8.85 s, sys: 1.52 s, total: 10.4 s
Wall time: 10.4 s
Out[9]:
array([(33992, 1155456064.0), (274165, 75166447225.0),
(723780, 523857488400.0), ..., (186771, 34883406441.0),
(488874, 238997787876.0), (618955, 383105292025.0)],
dtype=[('f0', '<i4'), ('f1', '<f8')])
In [10]: ct.tohdf5('h5_test.h5', '/test', mode='w')
In [11]: h5_table = tables.open_file('h5_test.h5').root.test
In [12]: h5_table
Out[12]:
/test (Table(1000000,), shuffle, blosc:lz4(5)) ''
description := {
"f0": Int32Col(shape=(), dflt=0, pos=0),
"f1": Float64Col(shape=(), dflt=0.0, pos=1)}
byteorder := 'little'
chunkshape := (10922,)
In [13]: %time h5_table.read_coordinates(indx)
CPU times: user 1.09 s, sys: 16 ms, total: 1.11 s
Wall time: 1.11 s
Oh yes, that's a lot of difference, and that should be fixed. Pull requests are welcome!
It looks like reading random coordinates of bcolz.ctable comes down to reading random coordinates of carrays. And from http://bcolz.readthedocs.io/en/latest/tutorial.html: 'Keep in mind that carrays are great for sequential access, but random access will highly likely trigger decompression of a different chunk for each randomly accessed value.'
Do you think it could be an issue in this case?
Not really because PyTables also has to deal with chunked data, so the bottleneck must be in some other place. Doing a profile would be helpful.
Bcolz.ctable reading random coordinates
In [13]: import cProfile
In [14]: cProfile.run('print (ct[indx])')
[(276899, 76673056201.0) (307470, 94537800900.0) (466150, 217295822500.0)
..., (33871, 1147244641.0) (292008, 85268672064.0) (238873, 57060310129.0)]
8059373 function calls (7959361 primitive calls) in 13.498 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:996(_handle_fromlist)
1 0.000 0.000 13.498 13.498 <string>:1(<module>)
1 0.000 0.000 0.000 0.000 arrayprint.py:208(_leading_trailing)
6 0.000 0.000 0.000 0.000 arrayprint.py:234(repr_format)
1 0.000 0.000 0.000 0.000 arrayprint.py:237(_array2string)
6 0.000 0.000 0.000 0.000 arrayprint.py:328(_convert_arrays)
7/1 0.000 0.000 0.000 0.000 arrayprint.py:340(array2string)
7 0.000 0.000 0.000 0.000 arrayprint.py:450(_extendLine)
1 0.000 0.000 0.000 0.000 arrayprint.py:458(_formatArray)
3 0.000 0.000 0.000 0.000 arrayprint.py:529(__init__)
3 0.000 0.000 0.000 0.000 arrayprint.py:543(fillFormat)
1 0.000 0.000 0.000 0.000 arrayprint.py:635(__init__)
3 0.000 0.000 0.000 0.000 arrayprint.py:657(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:685(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:696(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:713(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:734(__init__)
200004 0.093 0.000 0.255 0.000 carray_ext.pyx:1020(__get__)
200004 0.113 0.000 0.162 0.000 carray_ext.pyx:1030(__get__)
200000 0.175 0.000 10.414 0.000 carray_ext.pyx:1813(getitem_cache)
200000 0.507 0.000 11.001 0.000 carray_ext.pyx:1900(__getitem__)
186696 0.080 0.000 0.080 0.000 carray_ext.pyx:345(__cinit__)
195252 2.462 0.000 2.462 0.000 carray_ext.pyx:514(_getitem)
746784 0.065 0.000 0.065 0.000 carray_ext.pyx:649(decode_byte)
560088 0.206 0.000 0.206 0.000 carray_ext.pyx:654(decode_uint32)
186696 0.534 0.000 0.805 0.000 carray_ext.pyx:658(decode_blosc_header)
186696 0.021 0.000 0.021 0.000 carray_ext.pyx:710(__get__)
186696 0.192 0.000 0.882 0.000 carray_ext.pyx:717(__get__)
186696 0.415 0.000 1.659 0.000 carray_ext.pyx:759(_chunk_file_name)
186696 4.070 0.000 7.367 0.000 carray_ext.pyx:763(read_chunk)
196458 0.330 0.000 7.777 0.000 carray_ext.pyx:793(__getitem__)
400004 0.058 0.000 0.058 0.000 carray_ext.pyx:987(__get__)
400004 0.100 0.000 0.100 0.000 carray_ext.pyx:992(__get__)
100001/1 0.648 0.000 13.498 13.498 ctable.py:1194(__getitem__)
100000 0.334 0.000 11.388 0.000 ctable.py:1224(<listcomp>)
2 0.000 0.000 0.000 0.000 ctable.py:1240(<genexpr>)
100001 0.178 0.000 13.356 0.000 ctable.py:1251(<genexpr>)
100002 0.543 0.000 0.952 0.000 ctable.py:189(dtype)
200002 0.055 0.000 0.055 0.000 ctable.py:202(names)
400004 0.095 0.000 0.095 0.000 ctable.py:83(__getitem__)
186696 0.132 0.000 0.833 0.000 genericpath.py:16(exists)
7/1 0.000 0.000 0.000 0.000 numeric.py:1835(array_str)
6 0.000 0.000 0.000 0.000 numeric.py:2576(seterr)
6 0.000 0.000 0.000 0.000 numeric.py:2676(geterr)
3 0.000 0.000 0.000 0.000 numeric.py:2963(__init__)
3 0.000 0.000 0.000 0.000 numeric.py:2967(__enter__)
3 0.000 0.000 0.000 0.000 numeric.py:2972(__exit__)
373392 0.159 0.000 0.264 0.000 posixpath.py:39(_get_sep)
373392 0.562 0.000 1.031 0.000 posixpath.py:71(join)
1 0.000 0.000 0.000 0.000 {built-in method _functools.reduce}
1 0.000 0.000 0.000 0.000 {built-in method builtins.all}
1 0.000 0.000 13.498 13.498 {built-in method builtins.exec}
20 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
473425 0.150 0.000 0.150 0.000 {built-in method builtins.isinstance}
6 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
21 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
6 0.000 0.000 0.000 0.000 {built-in method builtins.repr}
1 0.014 0.014 0.014 0.014 {built-in method numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.concatenate}
1 0.127 0.127 13.483 13.483 {built-in method numpy.core.multiarray.fromiter}
12 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.geterrobj}
6 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.seterrobj}
186696 0.701 0.000 0.701 0.000 {built-in method posix.stat}
200016 0.049 0.000 0.049 0.000 {method 'append' of 'list' objects}
100000 0.122 0.000 0.122 0.000 {method 'copy' of 'numpy.ndarray' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
373392 0.077 0.000 0.077 0.000 {method 'endswith' of 'str' objects}
6 0.000 0.000 0.000 0.000 {method 'item' of 'numpy.ndarray' objects}
3 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
1 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
15 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
373392 0.128 0.000 0.128 0.000 {method 'startswith' of 'str' objects}
PyTables read_coordinates
In [18]: cProfile.run('print (h5_table.read_coordinates(indx))')
[(276899, 76673056201.0) (307470, 94537800900.0) (466150, 217295822500.0)
..., (33871, 1147244641.0) (292008, 85268672064.0) (238873, 57060310129.0)]
263 function calls (251 primitive calls) in 1.087 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
10 0.000 0.000 0.000 0.000 <frozen importlib._bootstrap>:996(_handle_fromlist)
1 0.000 0.000 1.087 1.087 <string>:1(<module>)
2 0.000 0.000 0.000 0.000 _methods.py:37(_any)
1 0.000 0.000 0.000 0.000 arrayprint.py:208(_leading_trailing)
6 0.000 0.000 0.000 0.000 arrayprint.py:234(repr_format)
1 0.000 0.000 0.000 0.000 arrayprint.py:237(_array2string)
6 0.000 0.000 0.000 0.000 arrayprint.py:328(_convert_arrays)
7/1 0.000 0.000 0.000 0.000 arrayprint.py:340(array2string)
7 0.000 0.000 0.000 0.000 arrayprint.py:450(_extendLine)
1 0.000 0.000 0.000 0.000 arrayprint.py:458(_formatArray)
3 0.000 0.000 0.000 0.000 arrayprint.py:529(__init__)
3 0.000 0.000 0.000 0.000 arrayprint.py:543(fillFormat)
1 0.000 0.000 0.000 0.000 arrayprint.py:635(__init__)
3 0.000 0.000 0.000 0.000 arrayprint.py:657(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:685(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:696(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:713(__init__)
1 0.000 0.000 0.000 0.000 arrayprint.py:734(__init__)
1 0.000 0.000 0.000 0.000 flavor.py:113(array_of_flavor2)
1 0.000 0.000 0.000 0.000 flavor.py:136(flavor_to_flavor)
1 0.000 0.000 0.000 0.000 flavor.py:158(internal_to_flavor)
1 0.000 0.000 0.000 0.000 flavor.py:368(conv_to_numpy)
1 0.000 0.000 0.000 0.000 flavor.py:380(_conv_numpy_to_numpy)
2 0.000 0.000 0.000 0.000 fromnumeric.py:1892(any)
1 0.000 0.000 0.000 0.000 leaf.py:203(flavor)
1 0.001 0.001 0.013 0.013 leaf.py:497(_point_selection)
1 0.000 0.000 0.000 0.000 node.py:336(_g_check_open)
7/1 0.000 0.000 0.000 0.000 numeric.py:1835(array_str)
6 0.000 0.000 0.000 0.000 numeric.py:2576(seterr)
6 0.000 0.000 0.000 0.000 numeric.py:2676(geterr)
3 0.000 0.000 0.000 0.000 numeric.py:2963(__init__)
3 0.000 0.000 0.000 0.000 numeric.py:2967(__enter__)
3 0.000 0.000 0.000 0.000 numeric.py:2972(__exit__)
2 0.000 0.000 0.000 0.000 numeric.py:414(asarray)
2 0.000 0.000 0.000 0.000 numeric.py:484(asanyarray)
1 0.000 0.000 1.078 1.078 table.py:1932(_read_coordinates)
1 0.009 0.009 1.087 1.087 table.py:1966(read_coordinates)
2 0.000 0.000 0.000 0.000 table.py:534(shape)
1 0.000 0.000 0.000 0.000 table.py:959(_get_container)
1 0.000 0.000 0.000 0.000 {built-in method _functools.reduce}
1 0.000 0.000 1.087 1.087 {built-in method builtins.exec}
21 0.000 0.000 0.000 0.000 {built-in method builtins.hasattr}
32 0.000 0.000 0.000 0.000 {built-in method builtins.isinstance}
6 0.000 0.000 0.000 0.000 {built-in method builtins.issubclass}
23 0.000 0.000 0.000 0.000 {built-in method builtins.len}
1 0.000 0.000 0.000 0.000 {built-in method builtins.min}
1 0.000 0.000 0.000 0.000 {built-in method builtins.print}
6 0.000 0.000 0.000 0.000 {built-in method builtins.repr}
5 0.012 0.002 0.012 0.002 {built-in method numpy.core.multiarray.array}
1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.concatenate}
1 0.000 0.000 0.000 0.000 {built-in method numpy.core.multiarray.empty}
12 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.geterrobj}
6 0.000 0.000 0.000 0.000 {built-in method numpy.core.umath.seterrobj}
1 1.065 1.065 1.065 1.065 {method '_read_elements' of 'tables.tableextension.Table' objects}
2 0.000 0.000 0.000 0.000 {method 'any' of 'numpy.ndarray' objects}
12 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
6 0.000 0.000 0.000 0.000 {method 'item' of 'numpy.ndarray' objects}
3 0.000 0.000 0.000 0.000 {method 'pop' of 'dict' objects}
3 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
15 0.000 0.000 0.000 0.000 {method 'rstrip' of 'str' objects}
Ignoring the other (apparent?) issues in that profile, I was curious about the impact of simply coalescing the chunk accesses. Created a proof-of-concept that presorts the indices, then reshuffles before returning. Costs you double the memory (hey, it's a POC after all), but demonstrates the performance impact of the issue.
On my computer, reading the 100,000 random indices takes ~7.90sec. Presorting and reshuffling drops that to ~415ms.
Created a proof-of-concept that presorts the indices, then reshuffles before returning.
I can confirm that having included that patch, a run time goes down significantly in this case.
In [4]: %paste
In [5]: N = int(1e6)
In [6]: ct = bcolz.fromiter(((i,i*i) for i in range(N)), dtype="i4,f8", count=N, rootdir='bz_test', mode='w')
In [7]: from random import sample
In [8]: indx = sample(range(1000000), k=100000)
In [9]: %time ct[indx]
## -- End pasted text --
CPU times: user 809 ms, sys: 3 ms, total: 812 ms
Wall time: 808 ms
Out[4]:
array([(428294, 183435750436.0), (37513, 1407225169.0),
(518376, 268713677376.0), ..., (663849, 440695494801.0),
(115971, 13449272841.0), (203379, 41363017641.0)],
dtype=[('f0', '<i4'), ('f1', '<f8')])