DataArrays.jl
DataArrays.jl copied to clipboard
The rabbit hole that is the data macro
Right now, the @data macro handles most things correctly except for variables that are equal to NA.
Consider this example:
a, b, c = 1, 2, NA
@data [a, b, c]
This will fail because the type of c isn't known from the surface analysis that the @data macro does. To fix this, we'd need to write out code that analyzes the values of the inputs, which can't be known at compile-time. So the macro needs to write code that does analysis at run-time.
Damn. In the discussion I started on julia-stats a while ago [1], @StefanKarpinski said that he thought [1,NA,2] should be possible at some point, using later-stage functions. Since this means @data wouldn't be needed anymore, I'd say just ignore this for now, and let people use the data() function.
1: https://groups.google.com/d/topic/julia-stats/1CYmgu1Lye0/discussion
I hope staged functions are sufficient. I am going to ignore this for now since I don't see a way forward without new tools in Base.
The right way to do this is to implement datahcat, datavcat, and datahvcat and make the data and pdata macros call these. I have this working, but 1) if we care about accurate type inference, it's a decent amount of code copy/pasted from Base with a few lines changed and 2) it's not currently working because of a bug in setindex!.
Oh great. I had naively assumed the *cat functions were builtins.
So much of our codebase is basically copy/pasted from other places that I've got no opposition to doing it again.
What's the bug in setindex!?
The bug is that setting a portion of a DataArray to be another DataArray fails if the latter has NAs. #62 has an easy fix that would unfortunately create a bunch of ambiguity warnings if applied to master, but it isn't the complete fix; we should also handle setting a portion of a DataArray to be an arbitrary AbstractDataArray.
(And after that change, my tests are still failing, now because of JuliaLang/julia#5448.)