netcdf-c
netcdf-c copied to clipboard
nc_get_vars incredibly slow in Windows compared to Linux
OS: Windows 10 NetCDF version: 4.9.1
I am trying to read a 3D double variable (2000 x 512 x 512) from a netCDF4 file with the following parameters: start = {0,0,0} count[] = {1000, 256, 256}; stride[] = {2, 2, 2}; chunk size: {20, 10, 10} shuffle: no deflate : yes deflate_level : 6
I time the call to nc_get_vars. On Debian 11, it takes ~25 seconds. On Windows 10, it takes ~130 seconds.
I would expect Windows to be slightly slower, but >5x slowdown is unexpected. I see similar slowdowns with 'nc_get_vars_double'
On the contrary, using 'nc_get_var_double' or 'nc_get_var' to read the whole variable is significantly faster (~3 sec on Linux, and ~1 sec on Windows)
-
Is there a way to optimize the performance of 'nc_get_vars' or 'nc_get_vars_double' so that Windows performance is closer to Linux performance?
-
Is reading the whole variable using 'nc_get_var' to memory and then slicing it later an option? I have seen that there were some discussions regarding this (https://github.com/Unidata/netcdf-c/issues/908) and that a submission was made to make strided reads faster. But for my variable, reading the whole variable still seems to be significantly faster than strided reads (especially on Windows)
Please find the link to the nc file here. Here is my code:
#include <stdio.h>
#include <string.h>
#include <netcdf.h>
#include <cstdlib>
#include <iostream>
#include <chrono>
int
main()
{
int status;
int ncid;
int varid;
int elems_x = 256;
int elems_y = 256;
int elems_z = 1000;
double* outData = (double*)malloc (elems_x*elems_y*elems_z*sizeof(double));
size_t start[] = {0, 0, 0};
size_t count[] = {1000, 256, 256};
ptrdiff_t stride[] = {2, 2, 2};
// open the NetCDF-4 file
status = nc_open("repro_nc4file.nc", NC_NOWRITE, &ncid);
if(status != NC_NOERR) {
printf("Could not open file.\n");
}
// get the varid
status = nc_inq_varid(ncid, "my_var", &varid);
printf("status after inq var = %d\n", status);
printf("varid = %d\n", varid);
// get the strided subset
auto timestart = std::chrono::high_resolution_clock::now();
status = nc_get_vars(ncid, varid, start, count, stride, outData);
auto timeend = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::seconds>(timeend - timestart);
std::cout << "Execution time: " << duration.count() << " seconds" << std::endl;
printf("status after getting strided subset = %d\n", status);
// close the file
status = nc_close(ncid);
printf("status after close = %d\n", status);
printf("End of test.\n\n");
return 0;
}
I would rewrite the code to try to use vara to see if the speed problem goes away.
You mean use vara to read the values with stride 1 and then do the slicing later?
Use vara and jump around to get the slicing you need, so you are reading the exact same data, but without vars.
Hello Ed, I tried your recommendation. The issue is that for using 'nc_get_vara', I ll have to read twice as many elements now (since for my original case, the stride is 2). So, instead of 1000 x 256 x 256 elements, I have to read 2000 x 512 x 512 elements.
Even with nc_get_vara, I still find that Windows is significantly slower:
Windows time: 102 seconds Linux time: 19 seconds
The only change I made to the previous code is to replace
status = nc_get_vars(ncid, varid, start, count, stride, outData);
with
status = nc_get_vara(ncid, varid, start, count, outData);
And int elems_x = 512; int elems_y = 512; int elems_z = 2000;
I am taking a look at this to see if I can determine if the slowdown is in libnetcdf, or if it is something in libhdf5.
@abhibaruah a couple questions, if I may, to ensure I'm on the same page.
- When you say Windows, you mean Visual Studio, correct? Or a gcc/variant on Windows
- What version of libhdf5 are you linking against?
Since we're using libhdf5 for file access, my fear is that this is an issue in libhdf5; that may limit our ability to address this. But it's not necessarily the case. I'll start by reproducing the issue, and go from there :).
Thanks @WardF for taking a look.
- Yes, I am using Visual Studio (VS2019v16.11.7)
- I am linking against HDF5 v1.10.10
I recall that this issue was raised some time ago. If memory serves, we proposed to convert vars code to use the corresponding HDF5 operations (I assume we are talking netcdf-4 and not netcdf-3). But apparently this proposal never got implemented.
Was the proposed change to use the corresponding HDF5 operations only for Windows? Because for my use case Linux time is reasonable (~20 sec) vs (>100 sec) for Windows.
I'm making some progress on this; I haven't narrowed it down to a solution, yet, but I'm able to replicate the observed issue using netCDF v4.9.1 and HDF5 1.10.10. Testing with netCDF main and HDF5 1.14.1, I see performance in line with what's observed in your linux environment. I'm still trying to determine if the culprit is a change in the netCDF code, or if it's a change in the HDF5 code.
@abhibaruah I'm seeing some mostly consistent results; out of curiosity, can you give it a try with v4.9.2?
Hello @WardF , When you say 'consistent' results, you mean consistent with the slow speeds I saw or similar to the speed on Linux?
Currently, we do not have v4.9.2 in our harness, and hence it will be difficult for me to build v4.9.2 with HDF5 v1.10.10 (will have to go through legal and administrative hoops for that).
I can download the Windows binaries from here (https://downloads.unidata.ucar.edu/netcdf/) and give it a try but I am guessing that you must have already tried it.
Let me clarify, thanks :). I'm seeing results consistent with what you've described, and I'm seeing them in a way I've been able to reproduce them. I'm not certain what the underlying issue is, but I am seeing much faster speeds using netCDF-C v4.9.2 (still slightly slower than on Linux, but that could be because of the VM I'm using, etc. But around 45 seconds instead of > 100).
I'm at a loss as to why this is only happening in Linux, and will continue trying to figure that out. I've tested with HDF5 1.10.10 as well as HDF5 1.14.1; the results are the same when using v4.9.1 (> 100 seconds), and faster when using netCDF v4.9.2 ( < 50 seconds), regardless of which version of HDF5 I'm using.
Just a note to follow up, HDF5 1.14.2 is out, I'm going to try to test this on Windows. I understand there are hoops to jump through, but the issue does appear to be related to the underlying HDF5 library.
@WardF I tried the repro with netCDF 4.9.2 and HDF5 1.10.11. Unfortunately, I am still seeing the same performance difference between Windows 11 and Debian 11. Windows 11 : ~130s Debian 11: ~11s
I am not sure why I am still seeing the slowness on WIndows. I created an HDF5 script to mimic my repro above (but with an H5 file), and the reading of the dataset is much faster (~30s).
@abhibaruah thank you, that is good to know at least, the HDF5 script does suggest it is something in netCDF, although why it would be Windows specific is puzzling. I'll pop this back to the top of the stack and see what I can sort out.
Hello @WardF Hope you are doing well. I tried the repro steps for this issue with netCCDF 4.9.2 + HDF5 1.14.4.3 and I could still see the slowdown.
Windows time - ~123 s Debian 12 time - ~15 s
Let me know if you find any new information regarding the same.
Thanks, Abhi
I also tried the netCDF repro steps with older versions of netCDF. Here are the results (in seconds).
Windows Linux
4.6.1 284.1 228
4.8.1 17.8 10.51
4.9.1 115.55 12.25
4.9.2 140 23
Looks like the Windows regression was introduced sometime between 4.8.1 and 4.9.1.
Thank you @abhibaruah, that certainly narrows it down. Thank you for bringing this back to the top of the stack, I will see what I can do to dial it in. If I can come up with a test on Windows to replicate this (I should be able to), I can do a git bisect to narrow it down even further. To answer a question I see you asked separately (while I was out of the office on PTO last week), I'm hoping to have rc2 for 4.9.3 out by the end of next week, and then moving forward with the full release barring any feedback which would prevent that. Thanks!