index / slice return carray option
when I am working with very large datasets eval() doesn't play well and as such I have to do a row by row approach :
for i in prog(range(my_len)):
row = bc.carray(distmat[i])
truth_row = bc.eval('row > {}'.format(threshold))
nn.append((sum([j for j in bc.carray(distmat[i]).where(truth_row)])-1))
but having to create a carray to use the where() function is a bit clunky..
I don't understand your question, could you elaborate a little.
A self-contained example is always best to show your point.
Sorry here's a better example:
In [1]: import numpy as np In [2]: import bcolz as bc In [3]: b = bc.carray(np.random.rand(5,5)) In [4]: b Out[4]: carray((5, 5), float64) nbytes: 200; cbytes: 31.99 KB; ratio: 0.01 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[ 0.00475486 0.21586652 0.40139066 0.48876736 0.02737035] [ 0.41732584 0.46331223 0.25571933 0.81353827 0.38294047] [ 0.44059723 0.77415085 0.45079994 0.25031323 0.27419751] [ 0.52612334 0.16658858 0.27212873 0.35714887 0.14922211] [ 0.42028842 0.84970211 0.82236438 0.82293391 0.6385532 ]] In [5]: row_to_eval = b[0] In [6]: eval = bc.eval('row_to_eval > 0.3')
Now if i try to do a list comprehension on that it pukes because the slice is a numpy array
In [7]: [i for i in row_to_eval.where(eval)] AttributeError Traceback (most recent call last)
in () ----> 1 [i for i in row_to_eval.where(eval)] AttributeError: 'numpy.ndarray' object has no attribute 'where'
vs constructing the carray construct
In [8]: [i for i in bc.carray(row_to_eval).where(eval)] Out[8]: [0.4013906620374499, 0.48876736290329625]`
It's not too much of a big deal but it would be nice for really big slices as well to have them returned as carrays instead of np arrays. Maybe as a cparam?
Okay. @FrancescAlted would you like to see this feature?
I would say that would be a useful addition, yes. A sensible parameter for the carray/ctable constructors could be out_flavor as it is already implemented in bcolz.eval (http://bcolz.blosc.org/reference.html?highlight=eval#bcolz.eval).
@vapemaster would you like to contribute a pull request for this?
I've never fooled around with cython, if you point me in the right direction I can give it a stab.
I agree that out_flavor would be an appropriate implementation
Well, Cython is very close in syntax to Python, so my recommendation is that you do a quick read at this first: http://docs.cython.org/src/tutorial/cython_tutorial.html
Then it would just a matter of adding a out_flavor parameter for carray constructor at: https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L309 and store the value in an internal attribute (self.out_flavor). Then, it is just a matter of converting into a carray in case self.out_flavor is 'carray' in line: https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L528
Also, please do not forget to contribute some tests and update docstrings accordingly. Luck!