netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Very slow slicing with libnetcdf 4.7.4

Open kdpenner opened this issue 4 years ago • 17 comments

Array slicing with libnetcdf 4.7.4 is very slow. The same slicing with 4.7.3 is fast.

To reproduce this issue, first download this file:

https://data.nodc.noaa.gov/thredds/catalog/ncei/woa/salinity/A5B7/0.25/catalog.html?dataset=ncei/woa/salinity/A5B7/0.25/woa18_A5B7_s16_04.nc

(I can't generate a MWE file.)

And run this code:

from netCDF4 import Dataset
import time

nf=Dataset('woa18_A5B7_s16_04.nc', 'r')

t0=time.time()
for i in range(100):
    for j in range(100):
        q=nf['s_an'][0,:,i,j]
t=time.time()-t0
print('it took: '+str(t))
nf.close()

With this environment:

# Name                    Version                   Build  Channel
libnetcdf                 4.7.3           nompi_hc957ea6_101    conda-forge
netcdf4                   1.5.3           nompi_py37h0154fc0_102    conda-forge

the code runs in ~3s.

With this environment:

# Name                    Version                   Build  Channel
libnetcdf                 4.7.4           nompi_hc957ea6_101    conda-forge
netcdf4                   1.5.3           nompi_py38h5d7d79e_103    conda-forge

I kill the process before it terminates.

kdpenner avatar Jun 10 '20 18:06 kdpenner

The release notes for libnetcdf 4.7.4 are here. I don't see anything in there that would obviously effect the read speed for slices.

I have confirmed that it is very slow with 4.7.4. There are 10,000 calls from python to the C routine nc_get_vara involved, so the more surprising thing to me is that it ran in only 3 seconds with version 4.7.3.

@WardF , do you know of any changes in 4.7.4 that could account for this?

jswhit avatar Jun 10 '20 20:06 jswhit

Wow, not off the top of my head, but let me take a look and see if I can narrow this down. @dennisheimbigner does anything leap out at you?

WardF avatar Jun 11 '20 16:06 WardF

I've created an issue on the C project page linking back to this issue.

WardF avatar Jun 11 '20 16:06 WardF

There was an issue about strides > 1. Is this the case here?

DennisHeimbigner avatar Jun 11 '20 16:06 DennisHeimbigner

strides are all 1 here (nc_get_vara is used)

jswhit avatar Jun 11 '20 19:06 jswhit

I suspect it has something to do with the chunksizes and/or the zlib compression. If I rewrite the file in netCDF classic 64-bit format (nccopy -6) the test script runs in 6 seconds.

Keeping the file netcdf4, but but converting the file to contiguous storage (without compression) also reduces the run time from >1000 to 6 seconds.

jswhit avatar Jun 11 '20 20:06 jswhit

Is it possible for you to avoid the loop? If I use

q=nf['s_an'][0,:,0:100,0:100]

the test runs in less than 0.1 seconds.

jswhit avatar Jun 11 '20 21:06 jswhit

I'm unsure---I made the example from a much larger script I inherited. I'll take a look.

kdpenner avatar Jun 11 '20 22:06 kdpenner

Yes, the loop is avoidable, but to me it seems like a short-term solution (with the caveat that I know nothing about the internal workings of netCDF).

kdpenner avatar Jun 13 '20 02:06 kdpenner

It's a performance regression in the C library. I've posted a C version of your test program to demonstrate this at Unidata/netcdf-c#1757.

jswhit avatar Jun 14 '20 20:06 jswhit

There's been some more discussion over at the issue linked above, but we believe we have found the root cause (changes in default cache-related values). These can be worked around by modifying them at configure time in netcdf-c, and we will evaluate reverting to the old defaults, once I have refreshed my memories surrounding the PR that increased them in the first place.

Is this slowdown observed when reading similar data, across the board? Or is this particular file we're testing against unusually slow, but other files remain relatively fast?

WardF avatar Jun 15 '20 21:06 WardF

My speculation would have been the the slowness is caused by whatever algorithm is used to find a match in the cache. But that should get slower with the number of cache elems, not the total size.

DennisHeimbigner avatar Jun 15 '20 23:06 DennisHeimbigner

I'm following the discussion on the C side.

Tomorrow I'll test with a few other files.

kdpenner avatar Jun 15 '20 23:06 kdpenner

I have tried using the Variable.set_var_chunk_cache method to reset the chunk size for the variable before slicing it, but this has no effect. Unfortunately, there currently is no hook for nc_set_chunk_cache to reset the values globally in the python interface. I did try adding it, and it does indeed fix the problem. The test script now looks like this:

from netCDF4 import Dataset, set_chunk_cache
import time
set_chunk_cache(4194304,1009)
nf=Dataset('woa18_A5B7_s16_04.nc', 'r')
t0=time.time()
for i in range(100):
    for j in range(100):
        q=nf['s_an'][0,:,i,j]
t=time.time()-t0
print('it took: '+str(t))
nf.close()

and it runs in a few seconds with both library versions 4.7.3 and 4.7.4.

PR #1019 introduces set_var_cache/get_var_cache module functions.

Thanks @WardF for debugging this so quickly!

jswhit avatar Jun 16 '20 00:06 jswhit

The lesson, I think, is that chunk parameters are highly data dependent, and it's better for the C library to be conservative when using default values, to allow for broadly-acceptable (if unoptimized) access speeds, vs. parameters that work great in some circumstances are are unworkable in others. I expect we will have a maintenance release of the C library in short order which reverts these parameters, barring some other solution presenting itself.

WardF avatar Jun 22 '20 23:06 WardF

I must confess to be confused by your last comment. The issue, I thought, was the cache parameters and not the chunksizes themselves. Correct?

DennisHeimbigner avatar Jun 23 '20 00:06 DennisHeimbigner