YAXArrays.jl icon indicating copy to clipboard operation
YAXArrays.jl copied to clipboard

Request: Return in-memory outputs by default for in-memory inputs

Open briochemc opened this issue 2 months ago • 2 comments

How can I apply a function to in-memory inputs and return in-memory outputs? Currently it seems I have to work a lot to make sure I get in-memory outputs...

MWE:

a = YAXArray(rand(10, 20, 5)) # in memory
b = YAXArray(rand(10, 20))    # in memory

2a                       # works (lazy) but...
2a |> readcubedata       # ... throws when read! (InexactError)
2.0a |> readcubedata     # in memory (but complicated syntax)
map(x->2x, a)            # in memory (but complicated syntax)

a .+ b                   # lazy (but simple syntax)
a .+ b |> readcubedata   # in memory (but complicated syntax)
xmap(+, a, b)            # lazy
map(+, a, b)             # throws! (ERROR: chunk sizes...)

I'd like to be able to do simple in-memory broadcasts like I can with standard Julia Arrays while keeping Julia's nice syntax, e.g., for 2a and a .+ b above to return in-memory YAXArrays.

(Note that the InexactError thrown by 2a is not really the concern here: I'd be happy with 2.0a giving me an in-memory array!)


EDIT: I may be missing some motivation behind the current behavior but getting in-memory outputs when all inputs are in-memory seems like a reasonable default to me.

briochemc avatar Oct 08 '25 03:10 briochemc

The main motivation that I am seeing is that with the new behaviour you can chain different operations which are defined on multiple lines of code and only do the computation as soon as you are looking at the data, but I agree, we would have to make it easier to either trigger the computation and also changing this default behaviour. Another major problem I see for smaller in-memory arrays is that we have a lot of overhead for simple operations. taking the sum as a YAXArray has a few seconds overhead compared to taking the sum of the underlying data.

In your upper example you could mix these two operations lazily and do the computation only once.

a = YAXArray(rand(10, 20, 5)) # in memory
b = YAXArray(rand(10, 20))    # in memory

a2 = 2.0a
a2plusb = a2 .+ b

Also a few remarks the failure of 2a is most likely #539 and I am also a bit confused by the xmap(+, a,b) because this is very different to what we get from map(+, a.data, b.data)

felixcremer avatar Oct 08 '25 04:10 felixcremer

As noted by @Balinus in #546, these also used to work in v0.6.1:

a = YAXArray(rand(10, 20, 5)) # in memory
b = YAXArray(rand(10, 20))    # in memory

2a                       # works (in memory)
a .+ b                   # works (in memory)

Also related to #545 (also broken by v0.7.0).

briochemc avatar Oct 09 '25 23:10 briochemc