netcdf-c
netcdf-c copied to clipboard
nc_put_var_double execution time increases in subsequent runs when a variable is written with chunking and compression
NetCDF version: 4.9.1 HDF5 version: 1.10.10 Platform: Windows 11
I have a repro code where I am creating a file, creating an NC_DOUBLE variable and writing a 2000 x 512 x 512 variable to it (with chunk sizes of 20 x 10 x 10 and deflate compression of level 5). I run this code in a for loop of 10 iterations and at the end of every loop, delete the created nc file.
I time the call to nc_put_var_double and I observe that its execution time increases for every subsequent iteration. This happens only on Windows and only when I write using both chunking and compression. I see similar behavior with 'nc_get_var_int' as well.
Here is the output from the program:
In main
Index: 0
In execution
### Execution time is 26.4505
Index: 1
In execution
### Execution time is 30.2897
Index: 2
In execution
### Execution time is 40.5599
Index: 3
In execution
### Execution time is 48.5356
Index: 4
In execution
### Execution time is 51.8821
Index: 5
In execution
### Execution time is 55.844
Index: 6
In execution
### Execution time is 60.3397
Index: 7
In execution
### Execution time is 62.5748
Index: 8
In execution
### Execution time is 67.3835
Index: 9
In execution
### Execution time is 72.4015
Here is my reproduction code:
#include <stdio.h>
#include <string>
#include <netcdf.h>
#include <chrono>
#include <iostream>
/* NetCDF file names */
#define FILE_NAME "sample_xyz.nc"
/* Test with 3D data */
#define NDIMS 3
#define NX 2000
#define NY 512
#define NZ 512
#define CHUNKX 20
#define CHUNKY 10
#define CHUNKZ 10
#define ERRCODE 2
#define ERR(e) {printf("Error: %s\n", nc_strerror(e)); exit(ERRCODE);}
using namespace std;
using namespace std::chrono;
void execution(double *arr)
{
std::cout << "In execution" << std::endl;
int status;
int ncid;
int varid;
int dimids[NDIMS];
int x_dimid, y_dimid, z_dimid;
int retval;
const size_t chunksize[NDIMS] = { CHUNKX, CHUNKY, CHUNKZ };
/* Create a NetCDF4 file. */
if ((retval = nc_create(FILE_NAME, NC_NETCDF4, &ncid)))
ERR(retval);
if (retval != NC_NOERR) {
printf("Error creating .NC file.\n");
}
/* Define dimensions */
if ((retval = nc_def_dim(ncid, "dim_512", NY, &y_dimid)))
ERR(retval);
if ((retval = nc_def_dim(ncid, "dim_2000", NX, &x_dimid)))
ERR(retval);
dimids[0] = x_dimid;
dimids[1] = y_dimid;
dimids[2] = y_dimid;
/* Define the variable */
if ((retval = nc_def_var(ncid, "data", NC_DOUBLE, NDIMS,
dimids, &varid)))
ERR(retval);
if ((retval = nc_def_var_chunking(ncid, varid, NC_CHUNKED, chunksize)))
ERR(retval);
if ((retval = nc_def_var_deflate(ncid, varid, 0, 1, 5)))
ERR(retval);
std::chrono::time_point<std::chrono::high_resolution_clock> timestart, timeend;
timestart = std::chrono::high_resolution_clock::now();
retval = nc_put_var_double(ncid, varid, arr);
timeend = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> duration = timeend - timestart;
double durationInSeconds = duration.count();
std::cout << "### Execution time is " << durationInSeconds << std::endl;
/* close the file */
status = nc_close(ncid);
remove(FILE_NAME);
}
int main() {
// Dynamically allocate memory for the 3D array
double* arr = new double[NX * NY * NZ];
int index = 0;
// Traverse the 3D array
for (int i = 0; i < NX; i++) {
for (int j = 0; j < NY; j++) {
for (int k = 0; k < NZ; k++) {
*(arr + index) = 5;
index++;
}
}
}
std::cout << "In main" << std::endl;
for (int i = 0; i < 10; i++)
{
std::cout << "Index: " << i << std::endl;
execution(arr);
}
delete[] arr;
}
Let me know if I am doing anything wrong or if any additional information is needed from my side.
Wanted to mention here that I could reproduce something similar to this in HDF5 as well. This might well be an HDF5 issue, instead of netCDF.
Thanks for your patience; I suspect this is an HDF5 issue (similar to other recent netCDF4 speed issues only observed on Windows), but will see if I can replicate!