vaex
vaex copied to clipboard
Countna()
The idea here is to get a quick missing values count.
I have considered returning a dict, but eventually settled on a pandas series for two main reasons: a. It's similar to value_counts() b. It is very easy to either sum it, or to_dict the result which is the most common cases.
Example
import vaex
import numpy as np
df = vaex.from_arrays(x=[1,2,np.nan], y=['a','b',None])
df.countna()
x 1
y 1
dtype: int64
I am happy with this PR.
Long term we probably do not want to return a pandas object, but since the same goes from things like describe
, value_counts
, this is probably ok too.
The question is do we want to also do the same for countnan
and countmissing
? Or maybe include them on demand?