netcdf4-python Very slow slicing with libnetcdf 4.7.4

Array slicing with libnetcdf 4.7.4 is very slow. The same slicing with 4.7.3 is fast.

To reproduce this issue, first download this file:

https://data.nodc.noaa.gov/thredds/catalog/ncei/woa/salinity/A5B7/0.25/catalog.html?dataset=ncei/woa/salinity/A5B7/0.25/woa18_A5B7_s16_04.nc

(I can't generate a MWE file.)

And run this code:

from netCDF4 import Dataset
import time

nf=Dataset('woa18_A5B7_s16_04.nc', 'r')

t0=time.time()
for i in range(100):
    for j in range(100):
        q=nf['s_an'][0,:,i,j]
t=time.time()-t0
print('it took: '+str(t))
nf.close()

With this environment:

# Name                    Version                   Build  Channel
libnetcdf                 4.7.3           nompi_hc957ea6_101    conda-forge
netcdf4                   1.5.3           nompi_py37h0154fc0_102    conda-forge

the code runs in ~3s.

With this environment:

# Name                    Version                   Build  Channel
libnetcdf                 4.7.4           nompi_hc957ea6_101    conda-forge
netcdf4                   1.5.3           nompi_py38h5d7d79e_103    conda-forge

I kill the process before it terminates.

Jun 10 '20 18:06 kdpenner

The release notes for libnetcdf 4.7.4 are here. I don't see anything in there that would obviously effect the read speed for slices.

I have confirmed that it is very slow with 4.7.4. There are 10,000 calls from python to the C routine nc_get_vara involved, so the more surprising thing to me is that it ran in only 3 seconds with version 4.7.3.

@WardF , do you know of any changes in 4.7.4 that could account for this?

Jun 10 '20 20:06 jswhit

Wow, not off the top of my head, but let me take a look and see if I can narrow this down. @dennisheimbigner does anything leap out at you?

Jun 11 '20 16:06 WardF

I've created an issue on the C project page linking back to this issue.

Jun 11 '20 16:06 WardF

There was an issue about strides > 1. Is this the case here?

Jun 11 '20 16:06 DennisHeimbigner

strides are all 1 here (nc_get_vara is used)

Jun 11 '20 19:06 jswhit

I suspect it has something to do with the chunksizes and/or the zlib compression. If I rewrite the file in netCDF classic 64-bit format (nccopy -6) the test script runs in 6 seconds.

Keeping the file netcdf4, but but converting the file to contiguous storage (without compression) also reduces the run time from >1000 to 6 seconds.

Jun 11 '20 20:06 jswhit

Is it possible for you to avoid the loop? If I use

q=nf['s_an'][0,:,0:100,0:100]

the test runs in less than 0.1 seconds.

Jun 11 '20 21:06 jswhit

I'm unsure---I made the example from a much larger script I inherited. I'll take a look.

Jun 11 '20 22:06 kdpenner

Yes, the loop is avoidable, but to me it seems like a short-term solution (with the caveat that I know nothing about the internal workings of netCDF).

Jun 13 '20 02:06 kdpenner

It's a performance regression in the C library. I've posted a C version of your test program to demonstrate this at Unidata/netcdf-c#1757.

Jun 14 '20 20:06 jswhit

There's been some more discussion over at the issue linked above, but we believe we have found the root cause (changes in default cache-related values). These can be worked around by modifying them at configure time in netcdf-c, and we will evaluate reverting to the old defaults, once I have refreshed my memories surrounding the PR that increased them in the first place.

Is this slowdown observed when reading similar data, across the board? Or is this particular file we're testing against unusually slow, but other files remain relatively fast?

Jun 15 '20 21:06 WardF

My speculation would have been the the slowness is caused by whatever algorithm is used to find a match in the cache. But that should get slower with the number of cache elems, not the total size.

Jun 15 '20 23:06 DennisHeimbigner

I'm following the discussion on the C side.

Tomorrow I'll test with a few other files.

Jun 15 '20 23:06 kdpenner

I have tried using the Variable.set_var_chunk_cache method to reset the chunk size for the variable before slicing it, but this has no effect. Unfortunately, there currently is no hook for nc_set_chunk_cache to reset the values globally in the python interface. I did try adding it, and it does indeed fix the problem. The test script now looks like this:

from netCDF4 import Dataset, set_chunk_cache
import time
set_chunk_cache(4194304,1009)
nf=Dataset('woa18_A5B7_s16_04.nc', 'r')
t0=time.time()
for i in range(100):
    for j in range(100):
        q=nf['s_an'][0,:,i,j]
t=time.time()-t0
print('it took: '+str(t))
nf.close()

and it runs in a few seconds with both library versions 4.7.3 and 4.7.4.

PR #1019 introduces set_var_cache/get_var_cache module functions.

Thanks @WardF for debugging this so quickly!

Jun 16 '20 00:06 jswhit

I tested 3 other files. The test runs equally well between libs for woa18_decav_t01_01.nc from:

https://data.nodc.noaa.gov/thredds/catalog/ncei/woa/temperature/decav/1.00/catalog.html?dataset=ncei/woa/temperature/decav/1.00/woa18_decav_t01_01.nc

There's a difference for woa18_decav_t05_04.nc from:

https://data.nodc.noaa.gov/thredds/catalog/ncei/woa/temperature/decav/0.25/catalog.html?dataset=ncei/woa/temperature/decav/0.25/woa18_decav_t05_04.nc

There's no difference for the gmt4 file from:

https://www.ngdc.noaa.gov/mgg/global/relief/ETOPO1/data/ice_surface/grid_registered/netcdf/

Jun 16 '20 16:06 kdpenner

The lesson, I think, is that chunk parameters are highly data dependent, and it's better for the C library to be conservative when using default values, to allow for broadly-acceptable (if unoptimized) access speeds, vs. parameters that work great in some circumstances are are unworkable in others. I expect we will have a maintenance release of the C library in short order which reverts these parameters, barring some other solution presenting itself.

Jun 22 '20 23:06 WardF

I must confess to be confused by your last comment. The issue, I thought, was the cache parameters and not the chunksizes themselves. Correct?

Jun 23 '20 00:06 DennisHeimbigner

netcdf4-python netcdf4-python copied to clipboard

Very slow slicing with libnetcdf 4.7.4

netcdf4-python
netcdf4-python copied to clipboard