DataArrays.jl
DataArrays.jl copied to clipboard
What should be the output of logical operators for PooledDataArray objects?
Now it is another PooledDataArray of Bool elements (possible levels: true or false).
I wonder if that really makes sense, or it should be just a DataArray of Bool. There are some operations, like element-wise logical operators, that do not work with PooledDataArray, so the current behaviour is problematic. A short example below (Julia 0.3.4 for Windows 64-bit, DataArrays 0.2.14).
julia> x = @pdata(["A","A","B","B"])
4-element PooledDataArray{ASCIIString,Uint32,1}:
"A"
"A"
"B"
"B"
julia> y = @data([1,2,1,3])
4-element DataArray{Int64,1}:
1
2
1
3
julia> x .== "A"
4-element PooledDataArray{Bool,Uint32,1}:
true
true
false
false
julia> y .< 2
4-element DataArray{Bool,1}:
true
false
true
false
julia> (x .== "A") & (y .< 2)
ERROR: `&` has no method matching &(::Array{Bool,1}, ::PooledDataArray{Bool,Uint32,1})
in & at D:\.julia\v0.3\DataArrays\src\operators.jl:543
I would say we should just special-case ban PooledDataArray{Bool}
. If you wanted to use it as factor, it already defines its own dummy representation. And it wastes storage since a bool costs less than any index into a pool of 2 values would.
Actually, the error here is rather that logical operators shouldn't return a PDA, but a standard Array{Bool}
or a BitArray
. We could ban PooledDataArray{Bool}
, but that's a different issue.