DataArrays.jl icon indicating copy to clipboard operation
DataArrays.jl copied to clipboard

Broadcasted not (.!) giving inconsistent return type when DataArray in expression

Open joshbode opened this issue 8 years ago • 14 comments

The broadcasted not operator .! (new in 0.6) is returning results with unexpected type when used directly on expressions containing a DataArray, e.g.

isa(.!isna.(x), DataArray) == true

whereas, if the BitArray result is stored in an intermediate variable:

y = isna.(x)
isa(.!y,  BitArray) == true

Example:

0.6.0-rc3.0> using DataArrays
0.6.0-rc3.0> x = DataArray(1:3);

0.6.0-rc3.0> isna.(x)
3-element BitArray{1}:
 false
 false
 false
# looks correct

0.6.0-rc3.0> .!isna.(x)
3-element DataArrays.DataArray{Bool,1}:
 true
 true
 true
# return has become a DataArray{Bool}

0.6.0-rc3.0> y = isna.(x); .!y
3-element BitArray{1}:
 true
 true
 true
# return is a BitArray as expected

Possible explanation - the code expansion looks rather more complicated than expected:

0.6.0-rc3.0> expand(:(.!f.(x)))
:($(Expr(:thunk, CodeInfo(:(begin
        $(Expr(:thunk, CodeInfo(:(begin
        global ##19#20
        const ##19#20
        $(Expr(:composite_type, Symbol("##19#20"), :((Core.svec)()), :((Core.svec)()), :(Core.Function), :((Core.svec)()), false, 0))
        return
    end))))
        $(Expr(:method, false, :((Core.svec)((Core.svec)(##19#20, Core.Any), (Core.svec)())), CodeInfo(:(begin
        #temp#@_3 = f(#temp#@_2)
        return !#temp#@_3
    end)), false))
        #19 = $(Expr(:new, Symbol("##19#20")))
        SSAValue(0) = #19
        return (Base.broadcast)(SSAValue(0), x)
    end)))))

versus:

0.6.0-rc3.0> expand(:(f.(x)))
:((Base.broadcast)(f, x))

0.6.0-rc3.0> expand(:(.!y))
:((Base.broadcast)(!, y))

which possibly suggests it is an upstream issue, independent of DataArrays.

joshbode avatar Jun 15 '17 06:06 joshbode

Two workarounds:

0.6.0-rc3.0> .!begin isna.(x) end
3-element BitArray{1}:
 true
 true
 true

0.6.0-rc3.0> .!(_ = isna.(x))
3-element BitArray{1}:
 true
 true
 true

joshbode avatar Jun 15 '17 06:06 joshbode

The complex code expansion I believe is expected since this is a fusing dot operation. Note that you get something similar with expand(:(a .+ b .- c)), for example.

As for this bug (and the fact that those workarounds work!), that is bizarre...

ararslan avatar Jun 15 '17 06:06 ararslan

I'm not sure the expansion is correct... it involves the expression-head :composite_type for example, which sounds like the dot might be being interpreted as field access perhaps? Don't know enough about the parser, but looks strange to me.

Also, for your example, the same trick works:

0.6.0-rc3.0> expand(:(a .+ begin b .- c end))
:($(Expr(:thunk, CodeInfo(:(begin
        # meta: location REPL[50] # REPL[50], line 1:
        # meta: pop location
        SSAValue(0) = (Base.broadcast)(-, b, c)
        return (Base.broadcast)(+, a, SSAValue(0))
    end)))))

Reported upstream, in any case - I don't think this is specifically a DataArrays issue.

joshbode avatar Jun 15 '17 06:06 joshbode

Indeed. Thanks for the report. In the meantime I'll see if there's anything we can do here to work around the weird return type.

ararslan avatar Jun 15 '17 06:06 ararslan

No worries. In the meantime, I guess !isna.(x) does work - though !(::BitArray) is now deprecated (which is how I got here in the first place!).

joshbode avatar Jun 15 '17 07:06 joshbode

broadcast(!, isna.(x)) has the right return type (though oddly enough broadcast(!isna, x) does not), so you could do that for now if you want to avoid the deprecation warnings.

ararslan avatar Jun 15 '17 07:06 ararslan

Good idea - thanks

joshbode avatar Jun 15 '17 07:06 joshbode

I think this is https://github.com/JuliaLang/julia/issues/19313. The problem is that multiple operations using the dot syntax are fused into a single one by the parser, so broadcast is not called on isna, but on x -> !isna(x), and the special-casing to return a BitArray rather than a DataArray{Bool} isn't used.

nalimilan avatar Jun 15 '17 13:06 nalimilan

Yep - looks like the same issue

joshbode avatar Jun 15 '17 22:06 joshbode

Unfortunately with fused broadcast there doesn't appear to be an easy way to specialize on a combination of functions, so we'll have to find another way to make .!isna.(x) return the desired BitArray...

ararslan avatar Jun 15 '17 22:06 ararslan

I don't think there's a solution with the current system unfortunately. Except undeprecating isna(x)...

nalimilan avatar Jun 16 '17 09:06 nalimilan

In the meantime, apart from not being able to entirely control the return vector type, I'm appreciating how awesome the broadcast-dot-syntax and loop fusion is :)

joshbode avatar Jun 16 '17 22:06 joshbode

I don't think we should remove the implicitly vectorized isna deprecation, since then we're in conflict with how things are done in Base. Until we can figure something out, be it a change in Base for 0.7 or some way to work around it in 0.6, we can just recommend map(!, isna.(x)) (equivalent and shorter than broadcast(!, isna.(x))), I guess. :/

ararslan avatar Jun 16 '17 23:06 ararslan

I don't think undeprecating vectorized isna is a real problem. Thanks to the deprecation, all places which could use the new syntax have been changed now, and people are familiar with it. So we could silently reintroduce it for this problematic case, and deprecate it after 0.7 will have provided a way to fix this. broadcast(!, isna.(x))) is kind of hard to discover.

nalimilan avatar Jun 17 '17 07:06 nalimilan