Oceananigans.jl
Oceananigans.jl copied to clipboard
Add time to `Field` generated from `FieldTimeSeries`
This PR illustrates how to add time information to a Field when it is created by indexing into a FieldTimeSeries.
@tomchor do you need this?
Yes I think this would be good. I was waiting for test to pass before reviewing, but let me know if you'd like me to review it now.
This PR just illustrates how to do it, so I need a review to know if it is worth pursuing.
Also the review can help resolve failing tests btw! I don't think we should have a policy that reviews wait for passing tests. It would make things way slower plus its a bit anti-collaborative
Sounds good. I'll take a closer look at it soon.
I have historically waited for the point where either reviews are requested or when there are only one or two failing tests left for review. But I'll start reviewing PRs sooner then.
I mean, I guess it's ultimately a personal choice, but I would just say that we're all in this together. I think we have worked together on many PRs starting much earlier than passing tests (eg writing the whole PR collaboratively for example).
PR looks good in the sense that it introduces FixedTime. But it doesn't actually seem to implement the FixedTime as status when slicing FieldTimeSeries, correct? e.g.:
julia> ω_timeseries
128×128×1×84 FieldTimeSeries{InMemory} located at (Face, Face, Center) of ω at two_dimensional_turbulence.jld2
├── grid: 128×128×1 RectilinearGrid{Float64, Periodic, Periodic, Flat} on CPU with 3×3×0 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory()
├── path: two_dimensional_turbulence.jld2
├── name: ω
└── data: 134×134×1×84 OffsetArray(::Array{Float64, 4}, -2:131, -2:131, 1:1, 1:84) with eltype Float64 with indices -2:131×-2:131×1:1×1:84
└── max=37.8946, min=-36.1335, mean=-0.00684511
julia> ω_timeseries[2].status
Based on the the word "illustrates" in first comment I see that might be intentional. But I'm assuming we want to actually make this work before merging, correct?
why doesn't
julia> ω_timeseries[2].status
work as expected?
But I'm assuming we want to actually make this work before merging, correct?
Of course we do. I need your help.
But I'm assuming we want to actually make this work before merging, correct?
Of course we do. I need your help.
I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.
But I'm assuming we want to actually make this work before merging, correct?
Of course we do. I need your help.
I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.
Question 1: is it useful to have time information in field.status? Comments about this design?
I extended the implementation to try to support your use case @tomchor. My original implementation, which was meant just to show the source code changes required, only supported backend = OnDisk().
It's still an illustration because we need tests to merge this. I will add tests once we agree on the design.
But I'm assuming we want to actually make this work before merging, correct?
Of course we do. I need your help.
I was a bit confused because you said this PR "illustrates" how to add time information. I didn't have time to investigate further because I'm pretty swamped until Thursday, but I'll look into it on Friday and try to make it work.
Question 1: is it useful to have time information in
field.status? Comments about this design?
I'm not sure if there's a better place for that information, but I think having that information somewhere within Field is pretty useful, yes. Putting it in status seems good to me!
This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like
julia> ω_timeseries
128×128×1×84 FieldTimeSeries{InMemory} located at (Face, Face, Center) of ω at two_dimensional_turbulence.jld2
├── grid: 128×128×1 RectilinearGrid{Float64, Periodic, Periodic, Flat} on CPU with 3×3×0 halo
├── indices: (:, :, :)
├── time_indexing: Linear()
├── backend: InMemory()
├── path: two_dimensional_turbulence.jld2
├── name: ω
└── data: 134×134×1×84 OffsetArray(::Array{Float64, 4}, -2:131, -2:131, 1:1, 1:84) with eltype Float64 with indices -2:131×-2:131×1:1×1:84
└── max=37.8946, min=-36.1335, mean=-0.00684511
julia> field = ω_timeseries[2, 10]
1×128×1 Field at time = FixedTime(0.6) and x = 0.8 #...
# Or something like that (possibly using WindowedFields
We have the information after all, so I don't see why we wouldn't cascade it down to subsets.
I extended the implementation to try to support your use case @tomchor. My original implementation, which was meant just to show the source code changes required, only supported
backend = OnDisk().It's still an illustration because we need tests to merge this. I will add tests once we agree on the design.
Nice, thanks! Can confirm that it works as expected with OnDisk().
This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like
I don't completely understand the abstraction you're proposing, can you elaborate?
This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like
I don't completely understand the abstraction you're proposing, can you elaborate?
I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from Fields and FieldTimeSeries to whatever subset of that Field we create via indexing and slicing. This PR already implements the time-information part of it :)
Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.
This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like
I don't completely understand the abstraction you're proposing, can you elaborate?
I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from
Fields andFieldTimeSeriesto whatever subset of thatFieldwe create via indexing and slicing. This PR already implements the time-information part of it :)Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.
Field already inherits the location and grid from FieldTimeSeries. What other spatial information do you need?
This is a separate discussion, but I think ideally in the future we'd have situations where we'd be able to do something like
I don't completely understand the abstraction you're proposing, can you elaborate?
I'm not proposing anything other than what's already being done for this PR at the moment. But what I was discussing was the possibility that not only the time information, but also spatial information can be passed down from
Fields andFieldTimeSeriesto whatever subset of thatFieldwe create via indexing and slicing. This PR already implements the time-information part of it :) Hopefully that makes it clearer, but lmk if it doesn't. I don't quite know a good way to explain it.
Fieldalready inherits the location and grid fromFieldTimeSeries. What other spatial information do you need?
For this PR, I think what you're doing is already great and I'd be happy to merge once there's a test there.
My other comments have to do with the fact that if you index a FieldTimeSeries in any dimension that it's not time, then you lose all information about the grid. For example:
julia> u[10][1, 2:4, 2]
3-element Vector{Float64}:
0.0027635785410754453
0.0022795491934685884
0.0019277783604979431
There's no way to know where that data is located. As a comparison, if I do the same thing in xarray (with a different file) I still get information about where the data is in the grid and time:
In [12]: ds.u.isel(time=10, xF=1, yC=range(2, 5), zC=2)
Out[12]:
<xarray.DataArray 'u' (yC: 3)>
array([14.13936664, 19.20323208, 20.05029434])
Coordinates:
zC float64 0.1562
xF float64 0.1963
* yC (yC) float64 0.4909 0.6872 0.8836
time float64 10.0
Attributes:
units: m/s
long_name: Velocity in the x-direction
This is one of the points I tried to discuss in https://github.com/CliMA/Oceananigans.jl/discussions/4364. I do think this PR is a nice and straightforward first step.
Instead of
u[10][1, 2:4, 2]
try
view(u[10], 1, 2:4, 2)
That's cool! Can we make getindex() behave like that for Fields by default? view(u[10], 1, 2:4, 2) is not super intuitive.
That's cool! Can we make
getindex()behave like that forFields by default?view(u[10], 1, 2:4, 2)is not super intuitive.
I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of getindex, which in turn is widely used in all kernels.
That's cool! Can we make
getindex()behave like that forFields by default?view(u[10], 1, 2:4, 2)is not super intuitive.I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of
getindex, which in turn is widely used in all kernels.
What about interior()?
That's cool! Can we make
getindex()behave like that forFields by default?view(u[10], 1, 2:4, 2)is not super intuitive.I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of
getindex, which in turn is widely used in all kernels.What about
interior()?
I'm not sure what you are suggesting. You want to change the definition of interior? How would you define the function that we currently call interior in that case? view is a Julia function and I think is well-motivated for this.
That's cool! Can we make
getindex()behave like that forFields by default?view(u[10], 1, 2:4, 2)is not super intuitive.I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of
getindex, which in turn is widely used in all kernels.What about
interior()?I'm not sure what you are suggesting. You want to change the definition of
interior? How would you define the function that we currently callinteriorin that case?viewis a Julia function and I think is well-motivated for this.
Okay. I guess I'm just slightly surprised that I can use view() and still retain grid properties, plot natively, etc. I don't see that anywhere in the docs and view() isn't a function I see being used a lot (but maybe that's just my bias).
What else to we need to merge this PR?
That's cool! Can we make
getindex()behave like that forFields by default?view(u[10], 1, 2:4, 2)is not super intuitive.I am hesitant to do this because it leads to different behavior than in Base julia. This behavior follows directly from our definition of
getindex, which in turn is widely used in all kernels.What about
interior()?I'm not sure what you are suggesting. You want to change the definition of
interior? How would you define the function that we currently callinteriorin that case?viewis a Julia function and I think is well-motivated for this.Okay. I guess I'm just slightly surprised that I can use
view()and still retain grid properties, plot natively, etc. I don't see that anywhere in the docs andview()isn't a function I see being used a lot (but maybe that's just my bias).What else to we need to merge this PR?
It is used in the Fields tutorial:
https://clima.github.io/OceananigansDocumentation/stable/fields/
If you would like to contribute more examples to the tutorial I think that is welcome. I am not sure where else we would expect to find this.
I looked into it and I find that DimensionalData uses a function rebuildsliced to provide view-like behavior with getindex:
https://github.com/rafaqz/DimensionalData.jl/blob/20b42dfe35d05f5b8f75bc2ade122123be152f46/src/array/indexing.jl#L129
The idea is basically that getindex(f, 1, 2, 3) returns a scalar (we need this, it is widely used in all kernels) but if we have a range index, we receive a Field. Note that view is different in this case, because view has subarray under the hood, whereas a range index creates a copy.
As for the present PR, I personally think it is complete.