vaex icon indicating copy to clipboard operation
vaex copied to clipboard

Countna()

Open xdssio opened this issue 3 years ago • 1 comments

The idea here is to get a quick missing values count.

I have considered returning a dict, but eventually settled on a pandas series for two main reasons: a. It's similar to value_counts() b. It is very easy to either sum it, or to_dict the result which is the most common cases.

Example

import vaex
import numpy as np
df = vaex.from_arrays(x=[1,2,np.nan], y=['a','b',None])
df.countna()

x    1
y    1
dtype: int64

xdssio avatar May 29 '21 19:05 xdssio

I am happy with this PR.

Long term we probably do not want to return a pandas object, but since the same goes from things like describe, value_counts, this is probably ok too.

The question is do we want to also do the same for countnan and countmissing ? Or maybe include them on demand?

JovanVeljanoski avatar Jun 19 '21 22:06 JovanVeljanoski