DataArrays.jl icon indicating copy to clipboard operation
DataArrays.jl copied to clipboard

The rabbit hole that is the data macro

Open johnmyleswhite opened this issue 11 years ago • 6 comments

Right now, the @data macro handles most things correctly except for variables that are equal to NA.

Consider this example:

a, b, c = 1, 2, NA
@data [a, b, c]

This will fail because the type of c isn't known from the surface analysis that the @data macro does. To fix this, we'd need to write out code that analyzes the values of the inputs, which can't be known at compile-time. So the macro needs to write code that does analysis at run-time.

johnmyleswhite avatar Jan 19 '14 00:01 johnmyleswhite

Damn. In the discussion I started on julia-stats a while ago [1], @StefanKarpinski said that he thought [1,NA,2] should be possible at some point, using later-stage functions. Since this means @data wouldn't be needed anymore, I'd say just ignore this for now, and let people use the data() function.

1: https://groups.google.com/d/topic/julia-stats/1CYmgu1Lye0/discussion

nalimilan avatar Jan 19 '14 10:01 nalimilan

I hope staged functions are sufficient. I am going to ignore this for now since I don't see a way forward without new tools in Base.

johnmyleswhite avatar Jan 19 '14 15:01 johnmyleswhite

The right way to do this is to implement datahcat, datavcat, and datahvcat and make the data and pdata macros call these. I have this working, but 1) if we care about accurate type inference, it's a decent amount of code copy/pasted from Base with a few lines changed and 2) it's not currently working because of a bug in setindex!.

simonster avatar Jan 19 '14 23:01 simonster

Oh great. I had naively assumed the *cat functions were builtins.

So much of our codebase is basically copy/pasted from other places that I've got no opposition to doing it again.

What's the bug in setindex!?

johnmyleswhite avatar Jan 20 '14 01:01 johnmyleswhite

The bug is that setting a portion of a DataArray to be another DataArray fails if the latter has NAs. #62 has an easy fix that would unfortunately create a bunch of ambiguity warnings if applied to master, but it isn't the complete fix; we should also handle setting a portion of a DataArray to be an arbitrary AbstractDataArray.

simonster avatar Jan 20 '14 01:01 simonster

(And after that change, my tests are still failing, now because of JuliaLang/julia#5448.)

simonster avatar Jan 20 '14 01:01 simonster