netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Different output file sizes between serial and parallel versions

Open sherimickelson opened this issue 4 years ago • 10 comments

I am using netcdf4-python version 1.5.6 built with netcdf-c version 4.7.4. netcdf-c was built using mpicc (icc version 19.0.5 and mpt version 2.22). I am running this example with miniconda3 -- python 3.6.4 on the cheyenne supercomputer operated by ncar.

My test is very similar to the example you provide, except I have increased the size of the dimension. The header on both netcdf output files is

netcdf parallel_test_0 {
dimensions:
	dim = 160000000 ;
variables:
	double var(dim) ;
}

When I run the parallel python script, the output file is 1.8GB and the serial version's output file in 1.2GB. When I change both scripts to output the variable as an int, the parallel version output is 1.2GB and the serial version output is 611MB. What's interesting is if I run nccopy on the parallel int version that is 1.2GB, it creates a file that is 611MB.

Here's the parallel python code I am running:

import sys
from mpi4py import MPI
import numpy as np
from netCDF4 import Dataset
import time

total_dsize = 160000000

start_time = time.perf_counter()
format = 'NETCDF4_CLASSIC'
rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)
size = MPI.COMM_WORLD.size

nc = Dataset('parallel_test_'+str(size)+'.nc', 'w', parallel=True, comm=MPI.COMM_WORLD,
        info=MPI.Info(),format=format)
# below should work also - MPI_COMM_WORLD and MPI_INFO_NULL will be used.
d = nc.createDimension('dim',total_dsize)
v = nc.createVariable('var', np.float, 'dim')
local_dsize = total_dsize/size
start_index = int(rank*local_dsize)
end_index = int(((rank+1)*local_dsize)) 
v[start_index:end_index] = float(rank)
# switch to collective mode, rewrite the data.
v.set_collective(True)
v[start_index:end_index] = float(rank)
nc.close()
if rank == 0:
    print(str(size)+':Total time:'+str(time.perf_counter() - start_time))

And here is the serial version of the script

import sys
import numpy as np
from netCDF4 import Dataset
import time

total_dsize = 160000000
rank = 0
start_time = time.perf_counter()
format = 'NETCDF4_CLASSIC'
nc = Dataset('parallel_test_'+str(rank)+'.nc', 'w', format=format)

d = nc.createDimension('dim',total_dsize)
v = nc.createVariable('var', np.float, 'dim')
local_dsize = total_dsize
start_index = 0
end_index = local_dsize
v[start_index:end_index] = float(rank)
v[start_index:end_index] = float(rank)
nc.close()
if rank == 0:
    print(str(rank)+':Total time:'+str(time.perf_counter() - start_time))

ncks is reporting these files to be identical:

> ncks --trd -m parallel_test_0.nc
var: type NC_DOUBLE, 1 dimension, 0 attributes, compressed? no, chunked? no, packed? no
var size (RAM) = 160000000*sizeof(NC_DOUBLE) = 160000000*8 = 1280000000 bytes
var dimension 0: dim, size = 160000000 (Non-coordinate dimension)

> ncks --trd -m parallel_test_1.nc
var: type NC_DOUBLE, 1 dimension, 0 attributes, compressed? no, chunked? no, packed? no
var size (RAM) = 160000000*sizeof(NC_DOUBLE) = 160000000*8 = 1280000000 bytes
var dimension 0: dim, size = 160000000 (Non-coordinate dimension)

but there is something that's different between them

1.8G Feb 17 12:31 parallel_test_1.nc
1.2G Feb 17 12:31 parallel_test_0.nc

parallel_test_0.nc --> serial output parallel_test_1.nc --> parallel output

Do you know why the parallel version of the script output is larger than the serial version?

sherimickelson avatar Feb 18 '21 21:02 sherimickelson

No, I don't. Must be something to do with how HDF5 allocates space for the data in parallel mode. Just out of curiosity, does the size change as you change the number of MPI tasks?

I think the best chance to get to the bottom of this is to ask on the HDF forum (https://support.hdfgroup.org/services/community_support.html).

jswhit avatar Feb 18 '21 23:02 jswhit

@jswhit The size does not change when I change the number of MPI tasks. Regardless to how many MPI tasks I use, the file size is either 1.2 GB for an int variable and 1.8GB for the float variable. It seems more related to a difference in size_t or block size between the parallel and serial versions.

Thanks to the pointer to the HDF forum. I'll look into posting the question there as well.

sherimickelson avatar Feb 19 '21 15:02 sherimickelson

FWIW, I can reproduce this result on our linux cluster.

jswhit avatar Feb 19 '21 23:02 jswhit

The dataset in your example is not chunked (v.chunking() reports 'contiguous'), If the dimension is set unlimited, then the dataset is chunked (v.chunking() reports [512]) and the difference in file size disappears.

jswhit avatar Feb 19 '21 23:02 jswhit

Thank you @jswhit for your response and your help. Besides setting the size of the dimension to None, are there any other changes that I need to make? I have removed the first call to v[start_index:end_index] = data_arr and I'm only setting the values after the v.set_collective(True) call, but I'm seeing this error

Traceback (most recent call last):
  File "test.py", line 33, in <module>
    v[start_index:end_index] = data_arr
  File "src/netCDF4/_netCDF4.pyx", line 4960, in netCDF4._netCDF4.Variable.__setitem__
  File "src/netCDF4/_netCDF4.pyx", line 5239, in netCDF4._netCDF4.Variable._put
  File "src/netCDF4/_netCDF4.pyx", line 1950, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

Do you know if I need to make any other changes?

sherimickelson avatar Feb 22 '21 21:02 sherimickelson

collective IO has to be on when writing to an unlimited dimension, so you'll have to comment out the first write (before collective IO is turned on)

jswhit avatar Feb 23 '21 01:02 jswhit

Thank you @jswhit . I do have that line commented out and my code only includes the one write statement after the v.set_collective(True) command

nc = Dataset('parallel_test_'+str(size)+'.nc', 'w', parallel=True, comm=MPI.COMM_WORLD,
        info=MPI.Info(),format=format)
d = nc.createDimension('dim',total_dsize)
t = nc.createDimension('time', size=None)
v = nc.createVariable('var', np.float, ('time','dim'))
local_dsize = int(total_dsize/size)
start_index = int(rank*local_dsize)
end_index = int(((rank+1)*local_dsize))
data_arr = np.random.uniform(low=280,high=330,size=(1,local_dsize))
print(str(rank)+" "+str(start_index)+" "+str(end_index))
print(v.shape)
print(data_arr.shape)
print(v.chunking())
v.set_collective(True)
v[:,start_index:end_index] = data_arr
nc.close()

and I'm still getting the generic error message RuntimeError: NetCDF: HDF error from the write statement.
The v.chunking() statement gives me [1, 522876] so it looks like it's chunked and not contiguous - so some progress :)

Do you see anything else that I could be doing wrong that would give me that generic error message when it tries to write?

sherimickelson avatar Feb 23 '21 23:02 sherimickelson

Works for me with

netcdf4-python version: 1.5.5.1 HDF5 lib version: 1.10.6 netcdf lib version: 4.7.4 numpy version 1.19.4

What version of HDF do you have?

jswhit avatar Feb 24 '21 01:02 jswhit

Thanks @jswhit I had been using more recent versions of hdf5 and netcdf4-python, but a lower version of numpy. I've created an environment with the same versions that you're using, but I still get the same error : RuntimeError: NetCDF: HDF error Which compiler did you use to build mpicc and which version of MPI are you using? I'm using icc version 19.0.5 and mpt version 2.22. I was just wondering if that's making a difference?

sherimickelson avatar Feb 24 '21 21:02 sherimickelson

I'm using the conda packages for mpi4py etc., so I'm pretty sure they are based on gcc and openmpi

jswhit avatar Feb 24 '21 23:02 jswhit