bcolz icon indicating copy to clipboard operation
bcolz copied to clipboard

index / slice return carray option

Open pyeguy opened this issue 10 years ago • 7 comments

when I am working with very large datasets eval() doesn't play well and as such I have to do a row by row approach :

for i in prog(range(my_len)):
    row = bc.carray(distmat[i])
    truth_row = bc.eval('row > {}'.format(threshold))
    nn.append((sum([j for j in bc.carray(distmat[i]).where(truth_row)])-1))

but having to create a carray to use the where() function is a bit clunky..

pyeguy avatar Jun 04 '15 22:06 pyeguy

I don't understand your question, could you elaborate a little.

esc avatar Jun 05 '15 06:06 esc

A self-contained example is always best to show your point.

FrancescAlted avatar Jun 05 '15 07:06 FrancescAlted

Sorry here's a better example:

In [1]: import numpy as np In [2]: import bcolz as bc In [3]: b = bc.carray(np.random.rand(5,5)) In [4]: b Out[4]: carray((5, 5), float64) nbytes: 200; cbytes: 31.99 KB; ratio: 0.01 cparams := cparams(clevel=5, shuffle=True, cname='blosclz') [[ 0.00475486 0.21586652 0.40139066 0.48876736 0.02737035] [ 0.41732584 0.46331223 0.25571933 0.81353827 0.38294047] [ 0.44059723 0.77415085 0.45079994 0.25031323 0.27419751] [ 0.52612334 0.16658858 0.27212873 0.35714887 0.14922211] [ 0.42028842 0.84970211 0.82236438 0.82293391 0.6385532 ]] In [5]: row_to_eval = b[0] In [6]: eval = bc.eval('row_to_eval > 0.3')

Now if i try to do a list comprehension on that it pukes because the slice is a numpy array

In [7]: [i for i in row_to_eval.where(eval)] AttributeError Traceback (most recent call last) in () ----> 1 [i for i in row_to_eval.where(eval)] AttributeError: 'numpy.ndarray' object has no attribute 'where'

vs constructing the carray construct

In [8]: [i for i in bc.carray(row_to_eval).where(eval)] Out[8]: [0.4013906620374499, 0.48876736290329625]`

It's not too much of a big deal but it would be nice for really big slices as well to have them returned as carrays instead of np arrays. Maybe as a cparam?

pyeguy avatar Jun 05 '15 22:06 pyeguy

Okay. @FrancescAlted would you like to see this feature?

esc avatar Jun 09 '15 17:06 esc

I would say that would be a useful addition, yes. A sensible parameter for the carray/ctable constructors could be out_flavor as it is already implemented in bcolz.eval (http://bcolz.blosc.org/reference.html?highlight=eval#bcolz.eval).

@vapemaster would you like to contribute a pull request for this?

FrancescAlted avatar Jun 09 '15 17:06 FrancescAlted

I've never fooled around with cython, if you point me in the right direction I can give it a stab.

I agree that out_flavor would be an appropriate implementation

pyeguy avatar Jun 10 '15 02:06 pyeguy

Well, Cython is very close in syntax to Python, so my recommendation is that you do a quick read at this first: http://docs.cython.org/src/tutorial/cython_tutorial.html

Then it would just a matter of adding a out_flavor parameter for carray constructor at: https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L309 and store the value in an internal attribute (self.out_flavor). Then, it is just a matter of converting into a carray in case self.out_flavor is 'carray' in line: https://github.com/Blosc/bcolz/blob/master/bcolz/carray_ext.pyx#L528

Also, please do not forget to contribute some tests and update docstrings accordingly. Luck!

FrancescAlted avatar Jun 10 '15 08:06 FrancescAlted