netcdf-c icon indicating copy to clipboard operation
netcdf-c copied to clipboard

nc_put_var_double execution time increases in subsequent runs when a variable is written with chunking and compression

Open abhibaruah opened this issue 1 year ago • 2 comments

NetCDF version: 4.9.1 HDF5 version: 1.10.10 Platform: Windows 11

I have a repro code where I am creating a file, creating an NC_DOUBLE variable and writing a 2000 x 512 x 512 variable to it (with chunk sizes of 20 x 10 x 10 and deflate compression of level 5). I run this code in a for loop of 10 iterations and at the end of every loop, delete the created nc file.

I time the call to nc_put_var_double and I observe that its execution time increases for every subsequent iteration. This happens only on Windows and only when I write using both chunking and compression. I see similar behavior with 'nc_get_var_int' as well.

Here is the output from the program:

In main
Index: 0
In execution
### Execution time is 26.4505
Index: 1
In execution
### Execution time is 30.2897
Index: 2
In execution
### Execution time is 40.5599
Index: 3
In execution
### Execution time is 48.5356
Index: 4
In execution
### Execution time is 51.8821
Index: 5
In execution
### Execution time is 55.844
Index: 6
In execution
### Execution time is 60.3397
Index: 7
In execution
### Execution time is 62.5748
Index: 8
In execution
### Execution time is 67.3835
Index: 9
In execution
### Execution time is 72.4015

Here is my reproduction code:

#include <stdio.h>
#include <string>
#include <netcdf.h>
#include <chrono>
#include <iostream>

/* NetCDF file names */
#define FILE_NAME "sample_xyz.nc"


/* Test with 3D data */
#define NDIMS 3
#define NX 2000
#define NY 512
#define NZ 512

#define CHUNKX 20
#define CHUNKY 10
#define CHUNKZ 10

#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}

using namespace std;
using namespace std::chrono;

void execution(double *arr)
{
    std::cout << "In execution" << std::endl;
    int status;
    int ncid;
    int varid;
    int dimids[NDIMS];
    int x_dimid, y_dimid, z_dimid;
    int retval;
    const size_t chunksize[NDIMS] = { CHUNKX, CHUNKY, CHUNKZ };

    

    /* Create a NetCDF4 file. */
    if ((retval = nc_create(FILE_NAME, NC_NETCDF4, &ncid)))
        ERR(retval);
    if (retval != NC_NOERR) {
        printf("Error creating .NC file.\n");
    }

    /* Define dimensions */
    if ((retval = nc_def_dim(ncid, "dim_512", NY, &y_dimid)))
        ERR(retval);
    if ((retval = nc_def_dim(ncid, "dim_2000", NX, &x_dimid)))
        ERR(retval);

    dimids[0] = x_dimid;
    dimids[1] = y_dimid;
    dimids[2] = y_dimid;

    /* Define the variable */
    if ((retval = nc_def_var(ncid, "data", NC_DOUBLE, NDIMS,
        dimids, &varid)))
        ERR(retval);
    
    if ((retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, chunksize)))
        ERR(retval);
        
    
    if ((retval = nc_def_var_deflate(ncid, varid, 0, 1, 5)))
        ERR(retval);
    

    std::chrono::time_point<std::chrono::high_resolution_clock> timestart, timeend;
    timestart = std::chrono::high_resolution_clock::now();
    retval = nc_put_var_double(ncid, varid, arr);
    timeend = std::chrono::high_resolution_clock::now();
    std::chrono::duration<double> duration = timeend - timestart;
    double durationInSeconds = duration.count();
    std::cout << "### Execution time is " << durationInSeconds << std::endl;

    /* close the file */
    status = nc_close(ncid);

    remove(FILE_NAME);

}

int main() {

    // Dynamically allocate memory for the 3D array
    double* arr = new double[NX * NY * NZ];
    int index = 0;
    // Traverse the 3D array
    for (int i = 0; i < NX; i++) {
        for (int j = 0; j < NY; j++) {
            for (int k = 0; k < NZ; k++) {
                *(arr + index) = 5;
                index++;

            }
        }
    }

    std::cout << "In main" << std::endl;

    for (int i = 0; i < 10; i++)
    {
        std::cout << "Index: " << i << std::endl;
        execution(arr);
    }

delete[] arr;
    
}

Let me know if I am doing anything wrong or if any additional information is needed from my side.

abhibaruah avatar Sep 06 '23 16:09 abhibaruah

Wanted to mention here that I could reproduce something similar to this in HDF5 as well. This might well be an HDF5 issue, instead of netCDF.

abhibaruah avatar Sep 18 '23 12:09 abhibaruah

Thanks for your patience; I suspect this is an HDF5 issue (similar to other recent netCDF4 speed issues only observed on Windows), but will see if I can replicate!

WardF avatar Sep 20 '23 00:09 WardF