NetCDF.jl icon indicating copy to clipboard operation
NetCDF.jl copied to clipboard

Reading data along chunked dimension does not scale linearly with amount of data

Open ali-ramadhan opened this issue 5 years ago • 3 comments

Super cool work on integrating DiskArrays.jl with NetCDF.jl! Looking forward to ditching xarray in favor of a pure Julia solution.

@visr helped me get up and running but we noticed that grabbing 2x as much data seems to take ~4x longer whereas I expected it to scale linearly. I am unfortunately interested in grabbing data along the dimension with chunk size 1...

julia> using NetCDF

julia> ds = NetCDF.open("/home/alir/cnhlab004/bsose_i122/bsose_i122_2013to2017_1day_Theta.nc", "THETA")
Disk Array with size 2160 x 588 x 52 x 1826

julia> NetCDF.getchunksize(ds)
(2160, 588, 19, 1)

julia> @time ds[100, 200, :, 300]
  0.012066 seconds (48 allocations: 2.500 KiB)

julia> @time ds[100, 200, :, 320:330]
  0.010111 seconds (55 allocations: 4.750 KiB)

julia> @time ds[100, 200, :, 300:400]
  5.256234 seconds (56 allocations: 23.016 KiB)

julia> @time ds[100, 200, :, 600:800]
 19.074392 seconds (56 allocations: 43.328 KiB)

ali-ramadhan avatar Feb 19 '20 16:02 ali-ramadhan

It's great to have an example of such a large NetCDF. At this moment I cannot tell if this time is spent in the NetCDF C library or in the Julia wrapper code. Though I think running the slower calls under a profiler should be able to give that information.

visr avatar Feb 20 '20 12:02 visr

I agree with @visr it is hard to say where the time is spent. Please note also that the NetCDF C library does some internal caching, so I guess your 3rd call was profiting from the previous reads. I found it very difficult to debug these kinds of problems. Ideally you would restart your Julia session after every data access to make sure NetCDF did not cache anything, but then you include precompilation in your timings...

meggart avatar Feb 20 '20 15:02 meggart

i cannot reproduce with my dataset which is of similar size but only three dimensions. @ali-ramadhan is this still a problem for you?

bjarthur avatar Feb 21 '24 13:02 bjarthur