arkouda
arkouda copied to clipboard
make unique and <Series>.value_counts() work with boolean dtype
IMO, this should not require a call into the sever value_counts
message.
This could be achieved with a True
-reduction and a subtraction from the length to get the counts for False
.
@kellyjoy15, I was not having any issues using boolean values in a series running .value_counts()
. GroupBy
, which handles the computations here, was updated in the latest release to handle boolean
dtype, Is it possible you're not using the latest release or would you still like for this functionality to be modified?
Here's my test case and output:
In [35]: s
Out[35]:
0 False
1 True
2 True
3 True
4 False
5 True
6 True
7 True
8 False
9 False
10 True
11 True
12 False
13 True
14 False
15 False
16 True
17 True
18 False
19 False
20 False
21 False
22 False
23 False
24 True
dtype: bool
In [40]: s.value_counts()
Out[40]:
False 13
True 12
dtype: int64
So, I am running v2022.04.15
GroupBy and groupby work with bools in that version, so I don't know if it is still a problem.
Here is my reproducer, along with other things I tried and the result:
a = ak.array([True, False, True,True])
s = ak.Series(a)
ak.value_counts(s)
# TypeError: type of argument "pda" must be arkouda.pdarrayclass.pdarray; got arkouda.series.Series instead
s.value_counts()
# TypeError: type of argument "pda" must be arkouda.pdarrayclass.pdarray; got numpy.bool_ instead
ak.value_counts(a)
# RuntimeError: Error: unique: bool not implemented
a.value_counts()
# AttributeError: 'pdarray' object has no attribute 'value_counts'
s.unique()
# AttributeError: 'Series' object has no attribute 'value_counts'
ak.unique(s)
# TypeError: must be pdarray, Strings, or Categorical {}
a.unique()
# AttributeError: 'pdarray' object has not attribute 'unique'
ak.unique(a)
# RuntimeError: Error: unique: bool not implemented
@kellyjoy15, this got a bit long so TL;DR
- workaround for the meantime
>>> a = ak.array([True, False, True,True]) >>> g = ak.GroupBy(a) >>> g.count() (array([False True]), array([1 3]))
-
Series
expects a Tuple of pdarrays i.e.s = ak.Series((ak.arange(a.size), a))
-
unique
doesn't supportbool
arrays in'v2022.04.15'
this is fixed in'v2022.05.05'
I think there are few things at play here,
-
The input is not what
Series
expects https://github.com/Bears-R-Us/arkouda/blob/67a5b542e28647bcc5d4b283b8f3e4bcfc006191/arkouda/series.py#L74-L77 So whenak.Series(a)
is called it's expectsindex
to be the first element andvalues
to be the second element. Soindex
is set toTrue
andvalues
is set toFalse
>>> a = ak.array([True, False, True,True]) >>> s = ak.Series(a) >>> s.index True >>> s.values False >>> s AttributeError: 'numpy.bool_' object has no attribute 'to_ndarray'
This explains the
# TypeError: type of argument "pda" must be arkouda.pdarrayclass.pdarray; got numpy.bool_ instead
. This is similar to the issue seen in #1267, and we need more robust type checking to error when theSeries
is intialized -
unique
doesn't supportbool
in'v2022.04.15'
. This was fixed and is in today's release'v2022.05.05'
, and will hopefully make its way to you soon.>>> ak.get_config()['arkoudaVersion'] 'v2022.04.15' >>> a = ak.array([True, False, True,True]) >>> s = ak.Series((ak.arange(a.size), a)) >>> s.index array([0 1 2 3]) >>>s.values array([True False True True]) >>> s.value_counts() RuntimeError: Error: unique: bool not implemented >>> ak.value_counts(a) RuntimeError: Error: unique: bool not implemented >>> ak.unique(a) RuntimeError: Error: unique: bool not implemented
-
The rest of the errors appear to be attribute methods that aren't implemented or functions that don't accept
Series
as a paramter type. I'll leave #1352 to decide if we should modifygroupable
types to accept series
in the latest release it looks like this (forgive me for the reordering, I tried to group them in a way that was logical to me)
>>> ak.get_config()['arkoudaVersion']
'v2022.05.05'
>>> a = ak.array([True, False, True,True])
>>> s = ak.Series((ak.arange(a.size), a))
>>> s
0 True
1 False
2 True
3 True
dtype: bool
# only way that works in 'v2022.04.15'
>>> ak.GroupBy(a).count()
(array([False True]), array([1 3]))
# works only in 'v2022.05.05'
>>> s.value_counts()
True 3
False 1
dtype: int64
>>> ak.value_counts(a)
(array([False True]), array([1 3]))
>>> ak.unique(a)
array([False True])
# methods not implemented
>>> s.unique()
AttributeError: 'Series' object has no attribute 'unique'
>>> a.unique()
AttributeError: 'pdarray' object has no attribute 'unique'
>>> a.value_counts()
AttributeError: 'pdarray' object has no attribute 'value_counts'
# Series not currently an accepted type
>>> ak.value_counts(s)
TypeError: type of argument "pda" must be arkouda.pdarrayclass.pdarray; got arkouda.series.Series instead
>>> ak.unique(s)
TypeError: <class 'arkouda.series.Series'> does not support grouping
it's worth noting that what you expected to happen is totally reasonable, as that's how it's handled in pandas. We should probably update Series
to automatically make the index if only values is passed in
>>> pd.Series([True,False,True,True])
0 True
1 False
2 True
3 True
dtype: bool
This discrepancy is captured in #1363
@pierce314159 checked and this is now functional.