Wes McKinney

Results 203 comments of Wes McKinney

Since LGPL is incompatible with ASF projects (it's Category X, see https://www.apache.org/legal/resolved#category-x), MIT/Apache 2.0 would be better

We also will consistently want to track null counts, so having a single code path for doing anything with nulls would be nice (i.e. bitmaps everwhere). We can think some...

My inclination on thinking some about this would be to use bitmaps for everything. The downside of this is that people have code littered with `s[...] = np.nan` they could...

@jreback we could make the design concession that NaN is treated as NA by default, so that these are all equivalent ``` from pandas import NA s[...] = NA s[...]...

@chrisaycock not sure, but my guess is the display will become NA rather than exposing the internal representation to the user

@jreback in setitem we can definitely map a set of sentinel values (NaN, NaT, None) to NA, it's more a memory representation question -- if incoming data has NaNs we...

That's a good point. Mutating the null slots is safe, so that would safe on memory / copying

I get somewhat different results on my machine (1.2 ghz core m5, clang 8.0) ``` bash $ gcc -O3 -msse4.2 -std=c++1y -lc++ bitmap-test.cc 22:39 ~/code/tmp $ ./a.out N = 100000000...

My understanding is that AVX2 instructions are even faster for popcnt, so we can check larger blocks

FWIW, my 6-years-later digest of NumPy's problems with missing data is that NumPy's user base is too heterogeneous. pandas's user base of "analytics" users is a smaller [EDIT: I had...