DataArrays.jl
DataArrays.jl copied to clipboard
Allow hashing DataArrays with NA values
Without this DataFrames.nonunique()
and friends do not work on frames with NA rows.
This seems like a good strategy. I'm confused why the old approach wouldn't have worked, though -- it seems like it should just be too slow.
For me on v0.4 the old code was throwing InexactError
or something like that when NAs were hashed.
I'm going to hold off merging for a bit to give others a chance to review, but the CI failure seems unrelated and this seems good to go.
I've just thought it might be better to use findnext(BitVector)
to skip NAs. I can resubmit an improved version.
Sounds good. I would do some profiling to make sure that it's worth the effort; for the almost no-NA case I imagine it will be meaningfully slower to use findnext
.
OK, I've replaced the PR with the findnext()
version. Both approaches should do more or less the same bit magic, so it's hard to say what would happen in the average "dense" case, but for "sparse" case findnext()
should be faster.