netcdf-c
netcdf-c copied to clipboard
large file tests hang at run_diskless2.sh
@DennisHeimbigner when trying to run large file tests (i.e. with --enable-large-file-tests) the test run_diskless2.sh hangs. It's been like this for more than an hour.
This is on a powerful multi-core machine with plenty of memory, so if the test can't run on this machine, it's too hard! ;-)
ed@mikado:~/netcdf-c/nc_test$ bash -x ./run_diskless2.sh
+ test x = x
++ pwd
+ srcdir=/home/ed/netcdf-c/nc_test
+ . ../test_common.sh
++ TOPSRCDIR=/home/ed/netcdf-c
++ TOPBUILDDIR=/home/ed/netcdf-c
++ FP_ISCMAKE=
++ FP_ISMSVC=
++ FP_WINVERMAJOR=0
++ FP_WINVERBUILD=0
++ FP_ISCYGWIN=
++ FP_ISMINGW=
++ FP_ISMSYS=
++ FP_ISOSX=
++ FP_ISREGEDIT=yes
++ FP_USEPLUGINS=yes
++ FP_ISREGEDIT=yes
++ FEATURE_HDF5=yes
++ FEATURE_HDF5=yes
++ FEATURE_S3TESTS=no
++ FEATURE_NCZARR_ZIP=no
++ FEATURE_FILTERTESTS=yes
++ set -e
++ test x = x1
+++ uname
++ system=Linux
++ test xLinux = x
++ top_srcdir=/home/ed/netcdf-c
++ top_builddir=/home/ed/netcdf-c
++ test x/home/ed/netcdf-c/nc_test = x
+++ pwd
++ builddir=/home/ed/netcdf-c/nc_test
++ execdir=/home/ed/netcdf-c/nc_test
+++ basename /home/ed/netcdf-c/nc_test
++ thisdir=nc_test
+++ pwd
++ WD=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ srcdir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c
+++ pwd
++ top_srcdir=/home/ed/netcdf-c
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ builddir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c
+++ pwd
++ top_builddir=/home/ed/netcdf-c
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ execdir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ export srcdir top_srcdir builddir top_builddir execdir
++ test -e /home/ed/netcdf-c/ncdump/ncdump.exe
++ ext=
++ export NCDUMP=/home/ed/netcdf-c/ncdump/ncdump
++ NCDUMP=/home/ed/netcdf-c/ncdump/ncdump
++ export NCCOPY=/home/ed/netcdf-c/ncdump/nccopy
++ NCCOPY=/home/ed/netcdf-c/ncdump/nccopy
++ export NCGEN=/home/ed/netcdf-c/ncgen/ncgen
++ NCGEN=/home/ed/netcdf-c/ncgen/ncgen
++ export NCGEN3=/home/ed/netcdf-c/ncgen3/ncgen3
++ NCGEN3=/home/ed/netcdf-c/ncgen3/ncgen3
++ export NCPATHCVT=/home/ed/netcdf-c/ncdump/ncpathcvt
++ NCPATHCVT=/home/ed/netcdf-c/ncdump/ncpathcvt
++ ncgen3c0=/home/ed/netcdf-c/ncgen3/c0.cdl
++ ncgenc0=/home/ed/netcdf-c/ncgen/c0.cdl
++ ncgenc04=/home/ed/netcdf-c/ncgen/c0_4.cdl
++ test x = xyes
++ test x = xyes
++ cd /home/ed/netcdf-c/nc_test
+ set -e
+ test x/home/ed/netcdf-c/nc_test = x
+ . ../test_common.sh
++ TOPSRCDIR=/home/ed/netcdf-c
++ TOPBUILDDIR=/home/ed/netcdf-c
++ FP_ISCMAKE=
++ FP_ISMSVC=
++ FP_WINVERMAJOR=0
++ FP_WINVERBUILD=0
++ FP_ISCYGWIN=
++ FP_ISMINGW=
++ FP_ISMSYS=
++ FP_ISOSX=
++ FP_ISREGEDIT=yes
++ FP_USEPLUGINS=yes
++ FP_ISREGEDIT=yes
++ FEATURE_HDF5=yes
++ FEATURE_HDF5=yes
++ FEATURE_S3TESTS=no
++ FEATURE_NCZARR_ZIP=no
++ FEATURE_FILTERTESTS=yes
++ set -e
++ test x = x1
+++ uname
++ system=Linux
++ test xLinux = x
++ top_srcdir=/home/ed/netcdf-c
++ top_builddir=/home/ed/netcdf-c
++ test x/home/ed/netcdf-c/nc_test = x
+++ pwd
++ builddir=/home/ed/netcdf-c/nc_test
++ execdir=/home/ed/netcdf-c/nc_test
+++ basename /home/ed/netcdf-c/nc_test
++ thisdir=nc_test
+++ pwd
++ WD=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ srcdir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c
+++ pwd
++ top_srcdir=/home/ed/netcdf-c
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ builddir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c
+++ pwd
++ top_builddir=/home/ed/netcdf-c
++ cd /home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
+++ pwd
++ execdir=/home/ed/netcdf-c/nc_test
++ cd /home/ed/netcdf-c/nc_test
++ export srcdir top_srcdir builddir top_builddir execdir
++ test -e /home/ed/netcdf-c/ncdump/ncdump.exe
++ ext=
++ export NCDUMP=/home/ed/netcdf-c/ncdump/ncdump
++ NCDUMP=/home/ed/netcdf-c/ncdump/ncdump
++ export NCCOPY=/home/ed/netcdf-c/ncdump/nccopy
++ NCCOPY=/home/ed/netcdf-c/ncdump/nccopy
++ export NCGEN=/home/ed/netcdf-c/ncgen/ncgen
++ NCGEN=/home/ed/netcdf-c/ncgen/ncgen
++ export NCGEN3=/home/ed/netcdf-c/ncgen3/ncgen3
++ NCGEN3=/home/ed/netcdf-c/ncgen3/ncgen3
++ export NCPATHCVT=/home/ed/netcdf-c/ncdump/ncpathcvt
++ NCPATHCVT=/home/ed/netcdf-c/ncdump/ncpathcvt
++ ncgen3c0=/home/ed/netcdf-c/ncgen3/c0.cdl
++ ncgenc0=/home/ed/netcdf-c/ncgen/c0.cdl
++ ncgenc04=/home/ed/netcdf-c/ncgen/c0_4.cdl
++ test x = xyes
++ test x = xyes
++ cd /home/ed/netcdf-c/nc_test
++ uname -p
+ CPU=x86_64
++ uname
+ OS=Linux
+ SIZE=500000000
+ FILE4=tst_diskless4.nc
+ rm -fr ref_tst_diskless4.cdl
+ cat
+ echo ''
+ rm -f tst_diskless4.nc
+ ./tst_diskless4 500000000 create
*** Create file
ok.
+ /home/ed/netcdf-c/ncdump/ncdump -h tst_diskless4.nc
+ diff -w - ref_tst_diskless4.cdl
+ echo ''
+ rm -f tst_diskless4.nc
+ ./tst_diskless4 500000000 creatediskless
*** Create file diskless
ok.
+ /home/ed/netcdf-c/ncdump/ncdump -h tst_diskless4.nc
+ diff -w - ref_tst_diskless4.cdl
+ echo ''
+ ./tst_diskless4 500000000 open
*** Open file
ok.
+ echo ''
+ ./tst_diskless4 500000000 opendiskless
*** Open file diskless
When I comment out running run_diskless2.sh, then all tests pass.
The purpose of that test is to create a large in-memory file of size 500 megabytes. So I think the issue is not the number of processors, but rather the amount of virtual memory available.
I have ~57 GB of available memory:
free -g
total used free shared buff/cache available
Mem: 62 4 27 0 31 57
Swap: 93 0 93
Refresh my memory; will each processor try to allocate the memory or will only one processor do it?
This is a sequential test so is only running on one processor...
Well I will suppress this test if running parallel. Hope that will fix the problem.
Fixed by PR https://github.com/Unidata/netcdf-c/pull/2316 ?
I would suggest that while #2316 addresses the fact that the test suite hangs on this configuration, it would be good to leave this issue open since no root cause has been identified. For all we know this could be due to some nasty bug lurking somewhere. Unless I missed something and we'd expect a test running on a single processor should fail in a parallel configuration.
Agreed that this has an underlying issue that will need to be resolved.
Since this test is allocating a 500mbyte block of virtual memory, my suspicion is that it is something like the one processor creating the block has to somehow pass (or copy) it to all the other processors before it can continue. Thoughts from Ed of Wei-King would be welcome.
I don't think so. This is not a parallel I/O problem - it hangs in sequential mode too.
@DennisHeimbigner does this test work for you on your machine?
is that it is something like the one processor creating the block has to somehow pass (or copy) it to all the other processors before it can continue.
Even if it was running it parallel, I would be completely and utterly shocked if that necessitated an actual copy rather than being handled by virtual addressing. And even if it did copy, with today's memory bandwidth, that should take less than a second.
I pass. I have no other ideas about what might be happening. Perhaps someone can apply a debugger or do some profiling to find out.
This should be resolved in PR #2319
This is fixed. I will close this issue.