NamedArrays.jl icon indicating copy to clipboard operation
NamedArrays.jl copied to clipboard

support reshape

Open schlichtanders opened this issue 5 years ago • 3 comments

it seems that many array operations are already supported, but not the quite central reshape.

Especially for combination with broadcasting, reshape(a, 1, :) is quite common to bring a vector in to the second dimension, or reshape(a, 1, 1, :) for the third dimension, and so forth.

schlichtanders avatar Jun 23 '20 12:06 schlichtanders

I agree that support for a more general reshape would be nice. I might look into implementing that, but transpose is however suppported:

julia> n = NamedArray(1:5)
5-element Named UnitRange{Int64}
A  │ 
───┼──
1  │ 1
2  │ 2
3  │ 3
4  │ 4
5  │ 5

julia> t = transpose(n)
1×5 Named Base.ReshapedArray{Int64,2,UnitRange{Int64},Tuple{}}
_ ╲ A │ 1  2  3  4  5
──────┼──────────────
1     │ 1  2  3  4  5

julia> transpose(t)
5×1 Named Array{Int64,2}
A ╲ _ │ 1
──────┼──
1     │ 1
2     │ 2
3     │ 3
4     │ 4
5     │ 5

dietercastel avatar Jul 01 '20 07:07 dietercastel

The main question with all these operations is how to choose the index names. Sometimes this is clear (as in the case of transpose), in other situations this is not trivial, and names might have to be conjured up.

There is infrastructure for that (NamedArrays.defaultnames()), but these are of type String, which might be different from the original index type (which is basically unrestricted—they can even be Int which can be confusing at times).

Changing the index types I believe means type instability and people hate that (and I don't understand type stability well enough to do anything about it).

Sometimes I think if we should not make the current NamedArray GeneralNamedArray or something, and then have the normal NamedArray be a special version with dimname type Symbol and index types String. If this is the case, we can do a lot more clever name calculations or renaming of the indices.

davidavdav avatar Jul 01 '20 09:07 davidavdav

NamedArrays is currently the go-to solution when you want to use strings/symbols instead of integer indexing.

Especially if you come from R you are used to have named indices everywhere (of course, there will also be limitations in R, but at least they are supported by base R). Hence I strongly agree with you that having a special version for Symbol.

I would choose Symbols as the default, as Symbol indexing should be much faster if I understood the underlying implementation correctly. For a Symbol there is always only one instance per name, while for a String there can be multiple different String instances with the same content. Hence hash and equals should be much faster for Symbol.

schlichtanders avatar Jul 04 '20 12:07 schlichtanders