NCDatasets.jl Use DiskArrays for indexing, broadcasting etc

I open this issue to discuss if this package should use DIskArrays.jl to implement efficient indexing + reductions and broadcasting for NCDatasets. This would enable workflows like this without loading all data into memory first.

There would be 3 ways to combine the two packages:

Make Variable a direct subtype of AbstractDiskArray, implement the necessary methods (analogous to this PR) and delete the old getindex/setindex This would add a hard dependency on DiskArrays
Use the implement_diskarray macro to directly overwrite the Base methods. This could happen with Requires so that users only get this behavior when they load the DIskArrays package, This would add a soft dependency on DIskArrays
Simply show in the Readme/docs how to wrap a Variable into a DiskArray, e.g. like in the unit tests https://github.com/meggart/DiskArrays.jl/blob/master/test/runtests.jl#L6-L38 Then the user can simply decide how to use a Variable, but would have to add some boilerplate code.

Let me know if you have questions or which option you would prefer in general and I can help with a PR

Feb 17 '20 20:02 meggart

I just tried the master version of your NetCDF.jl package and the DiskArray integrations works very nice and natural! Do I understand it correctly, that the disk access will be done in chunks of data? It seems that getchunksize / haschunks are the chunk size / chunk mode as defined in the NetCDF file. is this correct? Can you explain how NetCDF.jl / DiskArrays.jl behave if they are not defined? Thanks!

Feb 18 '20 08:02 Alexander-Barth

Do I understand it correctly, that the disk access will be done in chunks of data? Yes

It seems that getchunksize / haschunks are the chunk size / chunk mode as defined in the NetCDF file. is this correct? Yes

In case the chunks are not defined, DiskArrays is still operating on chunks, but it is falling back to this function https://github.com/meggart/DiskArrays.jl/blob/master/src/chunks.jl#L65-L77 which simply guesses a chunk size. This is controlled by the parameter default_chunk_size which sets the approximate amount of data you want to hold in memory at a time (defaults to 100Mb):

julia> using DiskArrays

julia> DiskArrays.estimate_chunksize((1000,1000,1000),sizeof(Float64))
(1000, 1000, 12)

julia> DiskArrays.default_chunk_size[] = 1 #Se chunks to 1Mb only
1

julia> DiskArrays.estimate_chunksize((1000,1000,1000),sizeof(Float64))
(1000, 125, 1)

Feb 18 '20 09:02 meggart

And just to add one thing. The chunks are only used for broadcast and reductions. Simple calls to getindex and setindex! will directly be translated to a readblock! call.

Feb 18 '20 09:02 meggart

[...] DickArrays.jl [...]

We need built-in auto-spelling for this package 😄

off-topic

Nice work with DiskArrays! Combined with ESDL and built-in distributed workload I feel we that the JuliaClimate stack is way better now that it was when I began developping ClimateTools 3-4 years ago. I will look into how I can use those packages and develop more climate analysis on top of them.

Feb 18 '20 14:02 Balinus

I made a try in a branch DiskArrays but I am hitting this error:

  Got exception outside of a @test

  ArgumentError: Indices of type Tuple{StepRange{Int64,Int64},StepRange{Int64,Int64}} are not yet 

  Stacktrace:

   [1] interpret_indices_disk(::NCDatasets.Variable{Float64,2}, ::Tuple{StepRange{Int64,Int64},StepRange{Int64,Int64}}) at /Users/travis/.julia/packages/DiskArrays/RJ0eT/src/DiskArrays.jl:61

I guess it comes from here:

https://github.com/meggart/DiskArrays.jl/blob/f84f40afe9b0d381732103b692133d641d30b7e8/src/DiskArrays.jl#L61

It seems that DiskArrays only support indexing with a stride (step) equal to one. Is this correct? Would it be possible to also support Julia 1.0.5 (I still support this version in my latest release)? But I admit, it is quite annoying to do so. Maybe it is better if I report such question in your issue tracker of DiskArrays; but let me know what you prefer.

Feb 19 '20 10:02 Alexander-Barth

Having issues collected here in this thread is fine. Regarding the Julia compatibility, I just created a branch that checks for the Julia version and for Julia 1.0 only implements getindex/setindex! but omits the other niceties like reductions and broadcasting, which need at least julioa 1.2. Would this work for you? https://github.com/meggart/DiskArrays.jl/pull/10

Feb 19 '20 11:02 meggart

Regarding the strided reading I am not sure what to do. The simplest solution would be to add a method that simply reads to whole array and then returns only the desired values. The positive thing is it would work out of the box for all DiskArray backends.

An alternative would be to extend the DiskArray interface with an optional readblock_strided function, which takes the strides as a second argument. Then Backends like NetCDF of HDF5 which provide a specialized function for strided reading could implement this function while the other backends like Zarr could fall back to a default solution as described above.

How common is it for you to read StepRanges along a dimension? I actually never needed it, but if it is important in other fields, we can make support for this first-class.

Feb 19 '20 12:02 meggart

Having issues collected here in this thread is fine. Regarding the Julia compatibility, I just created a branch that checks for the Julia version and for Julia 1.0 only implements getindex/setindex! but omits the other niceties like reductions and broadcasting, which need at least julioa 1.2. Would this work for you? meggart/DiskArrays.jl#10

Yes, this would be great. Let's try this. But if it gets too complicated we will have to drop version 1.0.

Feb 19 '20 19:02 Alexander-Barth

Regarding the strided reading I am not sure what to do. The simplest solution would be to add a method that simply reads to whole array and then returns only the desired values. The positive thing is it would work out of the box for all DiskArray backends.

An alternative would be to extend the DiskArray interface with an optional readblock_strided function, which takes the strides as a second argument. Then Backends like NetCDF of HDF5 which provide a specialized function for strided reading could implement this function while the other backends like Zarr could fall back to a default solution as described above.

How common is it for you to read StepRanges along a dimension? I actually never needed it, but if it is important in other fields, we can make support for this first-class.

I do actually used strides read sometimes, e.g. to make a quick plot of a high-resolution data set. Another use-case could be making an e.g. average over 3x3 adjacent cell grid point when the dataset is too large to hold in memory (there are of course multiple way to do it, but reading with a stride of 3 seems to me the most natural).

As far as I can tell, this is implemented in python, octave and matlab. It would be sad to loose feature parity here (when dealing with very large dataset where reading the whole array is not possible) .

The readblock_strided approach seems to me the best solution.

Feb 19 '20 20:02 Alexander-Barth

Maybe multiple dispatch can help here?

The NetCDF/HDF/Zarr package defines what type of indices it supports:

 readblock!(v::ZarrVar, aout, r::AbstractUnitRange...)
 readblock!(v::NetCDFVar, aout, r::Colon...) 
# calls nc_get_var (all the data)
 readblock!(v::NetCDFVar, aout, r::AbstractUnitRange...) 
# calls nc_get_vara (stride = 1)
 readblock!(v::NetCDFVar, aout, r::StepRange...)
# calls nc_get_vars (arbitrary stride)

In DiskArray we call define, generic, less-specific implementations:

readblock!(v, aout, r::StepRange...)
# call readblock! with unit range and then subset

Feb 20 '20 09:02 Alexander-Barth

Maybe multiple dispatch can help here?

Funny I was experimenting with exactly the same idea yesterday on this branch: https://github.com/meggart/DiskArrays.jl/pull/11 The only caveat here was that it is easy to run into method ambiguoities when you are too lax on the types of r, e.g readblock!(v::ZarrVar, aout, r...) would be ambigoous. However, I still like this approach that a package can decide if it wants to support OrdinalRange or AbstractUnitRange and we have the fallback solution.

So, if no one objects, I would merge the two PRs to DiskArrays soon and make a release afterwards. Then you can give your branch another try.

Feb 20 '20 15:02 meggart

Great, thanks Fabian! I will test the updated version of DiskArrays. I will also make some benchmarks; so far NCDatasets is about 2x faster than Python and R. I hope we can keep this advantage...

Feb 21 '20 13:02 Alexander-Barth

Hi @Alexander-Barth I just wanted to let you know that the DiskArray branches are merged and tagged. What did your performance shootout result in?

Mar 26 '20 16:03 meggart

Thanks you for letting me know! I am still trying the update the branch so that I passes the test suite. I have some problem with NetCDF scalars (arrays with 0 dimensions). But at this stage I am not sure what kind of changes are required where.

Apr 04 '20 21:04 Alexander-Barth

@meggart, do you have any option whether NetCDF scalar should a 0D-array or a regular julia scalar?

When I index the PseudoDiskArray example with a 0D-array, I get the following error:

julia> A = randn(())
0-dimensional Array{Float64,0}:
0.19403206384705804

julia> A[1]
0.19403206384705804

julia> PDA = PseudoDiskArray(A)
Disk Array with size 

julia> PDA[1]
ERROR: BoundsError: attempt to access ()
  at index [1]
Stacktrace:
 [1] indexed_iterate at ./tuple.jl:63 [inlined] (repeats 2 times)
 [2] interpret_indices_disk(::PseudoDiskArray{Float64,0,Array{Float64,0}}, ::Tuple{Int64}) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:111
 [3] getindex_disk(::PseudoDiskArray{Float64,0,Array{Float64,0}}, ::Int64) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:58
 [4] getindex(::PseudoDiskArray{Float64,0,Array{Float64,0}}, ::Int64) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:182
 [5] top-level scope at REPL[41]:1
 [6] eval(::Module, ::Any) at ./boot.jl:330
 [7] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/REPL/src/REPL.jl:86
 [8] run_backend(::REPL.REPLBackend) at /home/abarth/.julia/packages/Revise/Pcs5V/src/Revise.jl:1073
 [9] top-level scope at none:0

But this works:

julia> PDA[:]
Reading at index 
1-element Array{Float64,1}:
 0.19403206384705804

I use https://github.com/meggart/DiskArrays.jl/tree/1b69b350d320d671e27c13ef5de9aed7e23f17fc

It is indeed a quite strange use-case of DiskArrays and NCDatasets to use scalar NetCDF variable. But some ocean models I work with use them (ROMS).

Apr 05 '20 19:04 Alexander-Barth

This is a good catch and definitely a bug in DiskArrays. I will fix this.

Apr 06 '20 07:04 meggart

https://github.com/meggart/DiskArrays.jl/pull/14

Apr 07 '20 12:04 meggart

The bugfix is merged and tagged. Could you try this again?

Apr 07 '20 13:04 meggart

Thanks you for the fix! I confirm that 0D-Arrays work now. The next issue are arrays with unlimited dimensions (in netcdf 3.x you can have one and in netcdf 4.x you can have several). The idea is that the variable grows as needed. It seems that this is not supported in DiskArrays. I guess that an additional function is needed to tell DiskArrays if a given dimension is unlimited or not.

For your information, this is the type of code than I would like to run:

filename = tempname(); 
ds = NCDataset(filename,"c")
defDim(ds, "x", 10)
defDim(ds, "Time", Inf)
defVar(ds, "w", Float64, ("x", "Time"))

ds["w"][:,:] = ones(10,10)

# w should grow along the unlimited dimension
ds["w"][:,:] = ones(10,15)

I get these kind of errors:

ERROR: DimensionMismatch("new dimensions (10, 0) must be consistent with array size 100")
Stacktrace:
 [1] (::Base.var"#throw_dmrsa#196")(::Tuple{Int64,Int64}, ::Int64) at ./reshapedarray.jl:41
 [2] reshape(::Array{Float64,2}, ::Tuple{Int64,Int64}) at ./reshapedarray.jl:45
 [3] setindex_disk!(::NCDatasets.Variable{Float64,2}, ::Array{Float64,2}, ::Function, ::Vararg{Function,N} where N) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:65
 [4] setindex! at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:189 [inlined]
 [5] setindex!(::NCDatasets.CFVariable{Float64,2,NCDatasets.Variable{Float64,2},NCDatasets.Attributes,NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_or...

Apr 07 '20 20:04 Alexander-Barth

Ok, this is of course a more tricky use case. Note that I ds["w"][1:10,1:15] = ones[10,15] should already work with the current implementation. I am a bit hesitant to generally allow colon indexing to do everything. For example, do you think that for a disk array a of size 10x10 this a[:,:] = ones(4,4) should be equivalent to a[1:4,1:4] = ones(4,4), i.e. let the RHS determine the resolution of the Colon? Seems a bit counterintuitive to me, maybe others could comment as well.

Apr 09 '20 12:04 meggart

For example, do you think that for a disk array a of size 10x10 this a[:,:] = ones(4,4) should be equivalent to a[1:4,1:4] = ones(4,4), i.e. let the RHS determine the resolution of the Colon? Seems a bit counterintuitive to me, maybe others could comment as well.

I agree this shouldn't work

ds["w"][:,:] = ones(10,10)
# we should grow along the unlimited dimension
ds["w"][:,:] = ones(10,15)

It seems like a strange use of : to me too... Could you instead define a method to do that? like expand! or something so it's clearly labelled as something new that doesn't exist in base?

May 14 '20 14:05 rafaqz

In python's netCDF4, these statement work in the same way (as currently in NCDatasets):

import numpy as np
import netCDF4
f = netCDF4.Dataset('test.nc', 'w')
dim_t = f.createDimension('time', None)
nctime = f.createVariable('time', np.float32, ('time'))
nctime[:] = np.arange(0,5)
nctime[:] = np.arange(0,10)
f.close()

which produces:

netcdf test {
dimensions:
	time = UNLIMITED ; // (10 currently)
variables:
	float time(time) ;
data:

 time = 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ;
}

This is a quite reasonable choice to me as the NetCDF variable is declared as "grow-able" and it is certainly more convenient for the user than to write this:

nctime[1:length(data1)] = data1

In NetCDF4 there can be multiple unlimited dimensions, for instance if the 2nd and 3d dimensions are unlimited. This

ncdata[:,1:size(data,2),1:size(data,3)] = data

is quite unwieldy compared to just:

ncdata[:,:,:] = data

It reminds me for my Fortran 77 days, when I had the specify the data and the length of the dimensions. 😉 (and hell breaks loose if there is an inconsistency)

May 19 '20 19:05 Alexander-Barth

I totally agree that the functionality is good and useful. But using Colon/setindex! syntax like that is not Julian - you can't do that with Array. It will create problems for standardising packages like DiskArrays.jl and GeoData.jl.

This irregularity is already holding back implementing a DiskArrays.jl backend here, which is holding back using DiskArrays.jl in GeoData.jl (which will be a game-changing standardisation)

Why not implement your own more flexible setindex! method:

grow!(ncdata, data)
grow!(ncdata, data, 2)
etc...

Then you can do anything you like with it. You do loose[] bracket setindex! syntax, but people use view so it's not unfamiliar.

May 20 '20 03:05 rafaqz

FWIW, for testing purposes I have implemented a new branch at DiskArrays https://github.com/meggart/DiskArrays.jl/commit/11680a6a6453961582338ae97a7840e62a7bf50e which adds a resizable_indices trait where data types can decide which indices get expanded in the way @Alexander-Barth describes. This would mean that data types which don't explicitly enable this behavior will still not change in behavior. However, I still think this syntax would not be very Julia-like, so was holding back merging this until there is a bit more feedback.

May 20 '20 09:05 meggart

Thanks @meggart, I can confirm that the code in https://github.com/Alexander-Barth/NCDatasets.jl/issues/79#issuecomment-610598730 now works.

For this test case: https://github.com/Alexander-Barth/NCDatasets.jl/blob/master/test/test_append2.jl#L30

I had to make a small change in DiskArrays.getsize2, because the NetCDF variable has 3D dimensions (but sliced with [:,:,1]) and the data (i.e. the rhs) just two. Therefore, your variable a has 3 elements but v just two which lead to an out-of-bound error copied below

function get_size2(ri, a,v)
    ntuple(i->(in(i,ri) && (i <= length(v))) ? v[i] : a[i], length(a))
end

Do you agree with this change in DiskArrays?

julia> tempvar[:,:,itime] = [i+j for i = 1:length(lons), j = 1:length(lats)]
ERROR: BoundsError: attempt to access (22, 13)
  at index [3]
Stacktrace:
 [1] getindex(::Tuple, ::Int64) at ./tuple.jl:24
 [2] #19 at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:189 [inlined]
 [3] ntuple(::DiskArrays.var"#19#20"{Tuple{Int64},Tuple{Int64,Int64,Int64},Tuple{Int64,Int64}}, ::Int64) at ./ntuple.jl:18
 [4] get_size2(::Tuple{Int64}, ::Tuple{Int64,Int64,Int64}, ::Tuple{Int64,Int64}) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:189
 [5] get_size2(::NCDatasets.Variable{Float32,3,NCDataset}, ::Array{Float32,2}) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:188
 [6] setindex_disk!(::NCDatasets.Variable{Float32,3,NCDataset}, ::Array{Float32,2}, ::Function, ::Vararg{Any,N} where N) at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:72
 [7] setindex! at /home/abarth/.julia/dev/DiskArrays/src/DiskArrays.jl:201 [inlined]
 [8] setindex!(::NCDatasets.CFVariable{Float32,3,NCDatasets.Variable{Float32,3,NCDataset},NCDatasets.Attributes{NCDataset{Nothing}},NamedTuple{(:fillvalue, :scale_factor, :add_offset, :calendar, :time_origin, :time_factor),NTuple{6,Nothing}}}, ::Array{Int64,2}, ::Colon, ::Colon, ::Int64) at /home/abarth/.julia/dev/NCDatasets/src/cfvariable.jl:548
 [9] top-level scope at REPL[41]:1
 [10] eval(::Module, ::Any) at ./boot.jl:331
 [11] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [12] run_backend(::REPL.REPLBackend) at /home/abarth/.julia/packages/Revise/Pcs5V/src/Revise.jl:1073
 [13] top-level scope at none:0

May 22 '20 08:05 Alexander-Barth

Do you also support arrays of string? They have been introduced in NetCDF4.

I am getting this error message:

julia> include("/home/abarth/.julia/dev/NCDatasets/test/test_variable.jl");
ERROR: LoadError: Type String does not have a definite size.
Stacktrace:
 [1] sizeof at ./essentials.jl:451 [inlined]
 [2] estimate_chunksize at /home/abarth/.julia/dev/DiskArrays/src/chunks.jl:65 [inlined]
 [3] eachchunk_view(::DiskArrays.Unchunked, ::SubArray{String,2,NCDatasets.Variable{String,2,NCDataset},Tuple{Base.OneTo{Int64},Base.OneTo{Int64}},false}) at /home/abarth/.julia/dev/DiskArrays/src/subarrays.jl:35
 [4] eachchunk(::DiskArrays.SubDiskArray{String,2}) at /home/abarth/.julia/dev/DiskArrays/src/subarrays.jl:25
 [5] fill!(::DiskArrays.SubDiskArray{String,2}, ::String) at /home/abarth/.julia/dev/DiskArrays/src/ops.jl:140
 [6] copyto! at ./broadcast.jl:872 [inlined]
 [7] materialize!(::DiskArrays.SubDiskArray{String,2}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(identity),Tuple{Base.RefValue{String}}}) at ./broadcast.jl:823
 [8] (::var"#15#18")(::NCDataset{Nothing}) at /home/abarth/.julia/dev/NCDatasets/test/test_variable.jl:61
 [9] NCDataset(::var"#15#18", ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/abarth/.julia/dev/NCDatasets/src/dataset.jl:286
 [10] NCDataset(::Function, ::String, ::Vararg{String,N} where N) at /home/abarth/.julia/dev/NCDatasets/src/dataset.jl:284
 [11] top-level scope at /home/abarth/.julia/dev/NCDatasets/test/test_variable.jl:14
 [12] include(::String) at ./client.jl:439
 [13] top-level scope at REPL[3]:1
 [14] eval(::Module, ::Any) at ./boot.jl:331
 [15] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.4/REPL/src/REPL.jl:86
 [16] run_backend(::REPL.REPLBackend) at /home/abarth/.julia/packages/Revise/Pcs5V/src/Revise.jl:1073
 [17] top-level scope at none:0
in expression starting at /home/abarth/.julia/dev/NCDatasets/test/test_variable.jl:14

I guess that in

estimate_chunksize(a::AbstractArray) = estimate_chunksize(size(a), sizeof(eltype(a)))

where need to assume a typical length of string to estimate the chunk size.

Similarly, in netCDF4 there are also variable-length arrays which are loaded as Array{Vector{T},N} in NCDatasets. Can you confirm that they are withing the scope of DiskArrays?

May 22 '20 09:05 Alexander-Barth

@meggart

However, I still think this syntax would not be very Julia-like, so was holding back merging this until there is a bit more feedback.

This syntax means GeoData.jl and DimensionalData.jl will have to check index size on every setindex! call to be able to completely support DiskArrays.jl, and have a method for rebuilding every dimension index when it grows. Otherwise the dimension index can be the wrong size after setindex!, which usually is not possible in Julia.

The same goes for NamedDims, AxisArrays and every other package that has a dimension index - they will need to handle this unusual behaviour to support DiskArrays.jl. I'm pretty concerned about it being merged because of these consequences.

DiskArrays.jl could define another method that does this that is optionally supported, which I think would be better than having a broken setindex! everywhere.

May 24 '20 02:05 rafaqz

Even in the current master version of DiskArrays, disk arrays can grow (https://github.com/Alexander-Barth/NCDatasets.jl/issues/79#issuecomment-611495930 )

The question is just whether you need to write this like this

ds["w"][1:10,1:15] = ones(10,15)

or if you can use the shorter way (as in netCDF4 python and in the current version of NCDatasets)

ds["w"][:,:] = ones(10,15)

In both cases setindex! does change the size of the array.

I use unlimited time dimensions quite a lot and I do not see myself writing the first variant all the time.

May 24 '20 19:05 Alexander-Barth

In the current case we at least know the size from the indices passed to setindex!. In the proposed changes we need to check the array bounds. But I also think the current behaviour is going to be a problem.

It means that strange errors will occur in packages wrapping DiskArrays.jl if they assume standard Array behaviour of setindex!. Usually you can rely on julia throwing a bounds error if indices are larger than the array, so you would never check that the array size has changed.

To deal with this all wrapper packages will need to depend on DiskArrays or Requires/DiskArrays to add specific dispatch methods to handle this. Otherwise they will need to check either indices (current behaviour) or indices and array size (proposed behaviour) for every type of AbstractArray.

To me it's preferable that methods like setindex! can be supported or not supported at all, than that some arguments will be fine and some that look basically identical will cause strange errors. I'll have to support this behaviour either way in GeoData.jl, but it would be better in a method outside of setindex! so we don't have to do those checks every time, and so that setindex! is just vanilla Julia.

I use unlimited time dimensions quite a lot and I do not see myself writing the first variant all the time.

To me this discussion is about weighing up the utility of this bracket syntax against the amount of trouble this will cause in the ecosystem, not just about wether the syntax is good (on which I totally agree, it's the simplest syntax for this).

My main question is if this syntax is so much worse:

grow!(ds["w"], ones(10,15))

ds["w"][:,:] = ones(10,15)

It could also take arguments if it needs to:

grow!(ds["w"], ones(10,15), 20:30, 10:25)
# vs
ds["w"][20:30, 10:25] = ones(10,15)

May 25 '20 09:05 rafaqz

I do understand both sides of the argument here.

When I started DiskArrays I tried to be as un-intrusive as possible in the underlying package's decision. So all the package does is to translate a getindex or setindex! call into a respective readblock! call resolving trailing/missing indices and Colons. The bounds check should be done by the underlying package if necessary (or it just decides to re-size the array).

So at least so far I think DiskArrays will not help you to ensure that array sizes will not change in the meantime. I think it would be good to get some feedback from the Julia community if setindex! in itself should always keep the size of an AbstractArray. At least for Dictionaries there is one example for behavior where the Data structure grows on setindex!, although it looks like the the new Dictionaries.jl (https://github.com/andyferris/Dictionaries.jl) broke with this grow-on-setindex behavior.

Maybe it would help if I added a keyword argument checkbounds=false to the DiskArrays setindex implementation. So when you call setindex! in your DimensionallArrays code you can explicitly enforce bounds checking. In addition we can still discuss what the default behavior for any DiskArray should be...

May 31 '20 12:05 meggart

NCDatasets.jl NCDatasets.jl copied to clipboard

Use DiskArrays for indexing, broadcasting etc

NCDatasets.jl
NCDatasets.jl copied to clipboard