DimensionalData.jl icon indicating copy to clipboard operation
DimensionalData.jl copied to clipboard

Broadcasting over a `DimArray` drops the name field

Open JoshuaBillson opened this issue 1 year ago • 6 comments

Problem:

I've found that broadcasting over a DimArray causes the name field to be set to Symbol("")

I find this behaviour to be quite unexpected. Moreover, it's difficult to feed a broadcasted array into a table, since the column names depend on the array name.

Minimal Example:

julia> array = DimArray(zeros(2, 2), (:X, :Y), name=:Band_1)
2×2 DimArray{Float64,2} Band_1 with dimensions: Dim{:X}, Dim{:Y}
 0.0  0.0
 0.0  0.0

julia> array.name
:Band_1

julia> (array .* 1.0).name
Symbol("")

JoshuaBillson avatar Dec 16 '23 00:12 JoshuaBillson

Yeah, there are for and against for this. I think the question isnt about broadcast but if similar keeps the name or not.

Often the name says something about the data that the broadcast totally changes. And raw similar contains no data and is completely uninitialised. So having a name is a bit odd.

But IDK, I also see your point. Im open to changing it. Maybe try a PR changing similar and see what breaks.

rafaqz avatar Dec 16 '23 00:12 rafaqz

You should rename it to the previous name, plus the operation applied.

lazarusA avatar Feb 05 '24 20:02 lazarusA

You should rename it to the previous name, plus the operation applied.

The use-case that prompted this issue was converting digital numbers into reflectance for satellite imagery. For example, calculating the surface-reflectance for Sentinel 2 imagery is done as follows:

sentinel .* 0.0001f0

In this case, if a layer has the name :B04, I would like it to retain this after converting it to reflectance. Otherwise, downstream code that depends on the band name will break. The current solution is to rely on rebuild to restore the original name, but this can be quite tedious in practice and is easy to forget about. Layer names could also get long after applying multiple operations in sequence, such as (landsat .* 0.0000275f0) .+ -0.2f0.

JoshuaBillson avatar Feb 06 '24 10:02 JoshuaBillson

This is the problem, what to do in this case:

mycustomfunction.(sentinel .* 0.0001f0 .* some_other_dim_array.^2) ./ yet_another_dim_array

Its a broadcast just like yours, and at the level of similar we cant tell which is which.

Doing what you want probably requires putting name rules into our broadcast mechanisms. Its possible, but not easy.

(Or we can let precedence win, maybe that will be predictable enough that users can rearrange their functions so the right name "wins" but even so should this still be called :B04 ?)

rafaqz avatar Feb 06 '24 11:02 rafaqz

I suppose that one could use modify in such cases, which will retain the name of the desired array. You could also extract the data of all arrays whose name you don't want to retain, so that only one operand is a DimArray.

Perhaps we should consider that broadcasting directly over a DimArray is a bad idea? As you pointed out, it can be difficult to determine the identity of the output array based solely on the input. It might be better to provide a method that requires users to specify the name of the output.

JoshuaBillson avatar Feb 06 '24 11:02 JoshuaBillson

Broadcasting is a huge use case of DimArray and arrays generally in Julia. Its how everything works internally in Rasters.jl - DiskArrays.jl has lazy broadcasting over big data so we leverage this.

We could (should) allow you to specify the name in broadcast_dims which we own - but it doesn't have the nice syntax of base broadcasting.

This is just a hard problem.

We want the best generic native julia syntax, and we want names to propagate intelligently.

rafaqz avatar Feb 06 '24 12:02 rafaqz