Oceananigans.jl icon indicating copy to clipboard operation
Oceananigans.jl copied to clipboard

Add time to `Field` generated from `FieldTimeSeries`

Open glwagner opened this issue 9 months ago • 27 comments

This PR illustrates how to add time information to a Field when it is created by indexing into a FieldTimeSeries.

glwagner avatar Feb 26 '25 22:02 glwagner

@tomchor do you need this?

glwagner avatar Mar 11 '25 16:03 glwagner

Yes I think this would be good. I was waiting for test to pass before reviewing, but let me know if you'd like me to review it now.

tomchor avatar Mar 11 '25 16:03 tomchor

This PR just illustrates how to do it, so I need a review to know if it is worth pursuing.

glwagner avatar Mar 11 '25 16:03 glwagner

Also the review can help resolve failing tests btw! I don't think we should have a policy that reviews wait for passing tests. It would make things way slower plus its a bit anti-collaborative

glwagner avatar Mar 11 '25 16:03 glwagner

Sounds good. I'll take a closer look at it soon.

I have historically waited for the point where either reviews are requested or when there are only one or two failing tests left for review. But I'll start reviewing PRs sooner then.

tomchor avatar Mar 11 '25 16:03 tomchor

I mean, I guess it's ultimately a personal choice, but I would just say that we're all in this together. I think we have worked together on many PRs starting much earlier than passing tests (eg writing the whole PR collaboratively for example).

glwagner avatar Mar 11 '25 17:03 glwagner

PR looks good in the sense that it introduces FixedTime. But it doesn't actually seem to implement the FixedTime as status when slicing FieldTimeSeries, correct? e.g.:

julia> ω_timeseries
128×128×1×84 FieldTimeSeries{InMemory} located at (Face, Face, Center) of ω at two_dimensional_turbulence.jld2
├── grid: 128×128×1 RectilinearGrid{Float64, Periodic, Periodic, Flat} on CPU with 3×3×0 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory()
├── path: two_dimensional_turbulence.jld2
├── name: ω
└── data: 134×134×1×84 OffsetArray(::Array{Float64, 4}, -2:131, -2:131, 1:1, 1:84) with eltype Float64 with indices -2:131×-2:131×1:1×1:84
    └── max=37.8946, min=-36.1335, mean=-0.00684511

julia> ω_timeseries[2].status

Based on the the word "illustrates" in first comment I see that might be intentional. But I'm assuming we want to actually make this work before merging, correct?

tomchor avatar Mar 12 '25 15:03 tomchor

why doesn't

julia> ω_timeseries[2].status

work as expected?

glwagner avatar Mar 12 '25 15:03 glwagner

But I'm assuming we want to actually make this work before merging, correct?

Of course we do. I need your help.

glwagner avatar Mar 12 '25 15:03 glwagner

But I'm assuming we want to actually make this work before merging, correct?

Of course we do. I need your help.

I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.

tomchor avatar Mar 12 '25 16:03 tomchor

But I'm assuming we want to actually make this work before merging, correct?

Of course we do. I need your help.

I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.

Question 1: is it useful to have time information in field.status? Comments about this design?

glwagner avatar Mar 12 '25 16:03 glwagner

I extended the implementation to try to support your use case @tomchor. My original implementation, which was meant just to show the source code changes required, only supported backend = OnDisk().

It's still an illustration because we need tests to merge this. I will add tests once we agree on the design.

glwagner avatar Mar 12 '25 16:03 glwagner

But I'm assuming we want to actually make this work before merging, correct?

Of course we do. I need your help.

I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.

Question 1: is it useful to have time information in field.status? Comments about this design?

I'm not sure if there's a better place for that information, but I think having that information somewhere within Field is pretty useful, yes. Putting it in status seems good to me!

This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like

julia> ω_timeseries
128×128×1×84 FieldTimeSeries{InMemory} located at (Face, Face, Center) of ω at two_dimensional_turbulence.jld2
├── grid: 128×128×1 RectilinearGrid{Float64, Periodic, Periodic, Flat} on CPU with 3×3×0 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory()
├── path: two_dimensional_turbulence.jld2
├── name: ω
└── data: 134×134×1×84 OffsetArray(::Array{Float64, 4}, -2:131, -2:131, 1:1, 1:84) with eltype Float64 with indices -2:131×-2:131×1:1×1:84
    └── max=37.8946, min=-36.1335, mean=-0.00684511

julia> field = ω_timeseries[2, 10]
1×128×1 Field at time = FixedTime(0.6) and x = 0.8 #...
# Or something like that (possibly using WindowedFields

We have the information after all, so I don't see why we wouldn't cascade it down to subsets.

tomchor avatar Mar 12 '25 18:03 tomchor

I extended the implementation to try to support your use case @tomchor. My original implementation, which was meant just to show the source code changes required, only supported backend = OnDisk().

It's still an illustration because we need tests to merge this. I will add tests once we agree on the design.

Nice, thanks! Can confirm that it works as expected with OnDisk().

tomchor avatar Mar 12 '25 18:03 tomchor

This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like

I don't completely understand the abstraction you're proposing, can you elaborate?

glwagner avatar Mar 12 '25 20:03 glwagner

This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like

I don't completely understand the abstraction you're proposing, can you elaborate?

I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from Fields and FieldTimeSeries to whatever subset of that Field we create via indexing and slicing. This PR already implements the time-information part of it :)

Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.

tomchor avatar Mar 28 '25 02:03 tomchor

This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like

I don't completely understand the abstraction you're proposing, can you elaborate?

I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from Fields and FieldTimeSeries to whatever subset of that Field we create via indexing and slicing. This PR already implements the time-information part of it :)

Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.

Field already inherits the location and grid from FieldTimeSeries. What other spatial information do you need?

glwagner avatar Apr 18 '25 19:04 glwagner

This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like

I don't completely understand the abstraction you're proposing, can you elaborate?

I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from Fields and FieldTimeSeries to whatever subset of that Field we create via indexing and slicing. This PR already implements the time-information part of it :) Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.

Field already inherits the location and grid from FieldTimeSeries. What other spatial information do you need?

For this PR, I think what you're doing is already great and I'd be happy to merge once there's a test there.

My other comments have to do with the fact that if you index a FieldTimeSeries in any dimension that it's not time, then you lose all information about the grid. For example:

julia> u[10][1, 2:4, 2]
3-element Vector{Float64}:
 0.0027635785410754453
 0.0022795491934685884
 0.0019277783604979431

There's no way to know where that data is located. As a comparison, if I do the same thing in xarray (with a different file) I still get information about where the data is in the grid and time:

In [12]: ds.u.isel(time=10, xF=1, yC=range(2, 5), zC=2)
Out[12]: 
<xarray.DataArray 'u' (yC: 3)>
array([14.13936664, 19.20323208, 20.05029434])
Coordinates:
    zC       float64 0.1562
    xF       float64 0.1963
  * yC       (yC) float64 0.4909 0.6872 0.8836
    time     float64 10.0
Attributes:
    units:      m/s
    long_name:  Velocity in the x-direction

This is one of the points I tried to discuss in https://github.com/CliMA/Oceananigans.jl/discussions/4364. I do think this PR is a nice and straightforward first step.

tomchor avatar Apr 21 '25 15:04 tomchor

Instead of

u[10][1, 2:4, 2]

try

view(u[10], 1, 2:4, 2)

glwagner avatar Apr 22 '25 15:04 glwagner

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

tomchor avatar Apr 22 '25 20:04 tomchor

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.

glwagner avatar Apr 22 '25 20:04 glwagner

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.

What about interior()?

tomchor avatar Apr 22 '25 20:04 tomchor

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.

What about interior()?

I'm not sure what you are suggesting. You want to change the definition of interior? How would you define the function that we currently call interior in that case? view is a Julia function and I think is well-motivated for this.

glwagner avatar Apr 22 '25 20:04 glwagner

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.

What about interior()?

I'm not sure what you are suggesting. You want to change the definition of interior? How would you define the function that we currently call interior in that case? view is a Julia function and I think is well-motivated for this.

Okay. I guess I'm just slightly surprised that I can use view() and still retain grid properties, plot natively, etc. I don't see that anywhere in the docs and view() isn't a function I see being used a lot (but maybe that's just my bias).

What else to we need to merge this PR?

tomchor avatar Apr 22 '25 21:04 tomchor

That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.

I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.

What about interior()?

I'm not sure what you are suggesting. You want to change the definition of interior? How would you define the function that we currently call interior in that case? view is a Julia function and I think is well-motivated for this.

Okay. I guess I'm just slightly surprised that I can use view() and still retain grid properties, plot natively, etc. I don't see that anywhere in the docs and view() isn't a function I see being used a lot (but maybe that's just my bias).

What else to we need to merge this PR?

It is used in the Fields tutorial:

https://clima.github.io/OceananigansDocumentation/stable/fields/

If you would like to contribute more examples to the tutorial I think that is welcome. I am not sure where else we would expect to find this.

glwagner avatar Apr 22 '25 21:04 glwagner

I looked into it and I find that DimensionalData uses a function rebuildsliced to provide view-like behavior with getindex:

https://github.com/rafaqz/DimensionalData.jl/blob/20b42dfe35d05f5b8f75bc2ade122123be152f46/src/array/indexing.jl#L129

The idea is basically that getindex(f, 1, 2, 3) returns a scalar (we need this, it is widely used in all kernels) but if we have a range index, we receive a Field. Note that view is different in this case, because view has subarray under the hood, whereas a range index creates a copy.

glwagner avatar Apr 22 '25 21:04 glwagner

As for the present PR, I personally think it is complete.

glwagner avatar Apr 22 '25 21:04 glwagner