PnetCDF
PnetCDF copied to clipboard
Type conversion error r8 -> r4
The following code, derived from put_var.f90, demonstrates a problem with R8->R4 conversion in pnetcdf.
IBM XL C/C++ for Linux, V16.1.1 (Beta 6)
Version: 16.01.0001.0000
ppc64le
!
! Copyright (C) 2013, Northwestern University and Argonne National Laboratory
! See COPYRIGHT notice in top-level directory.
!
! $Id$
!
! This example shows how to use nf90mpi_put_var_all() to write a 2D
! 4-byte integer array in parallel. It first defines a netCDF variable of
! size global_nx * global_ny where
! global_nx == 5 and
! global_ny == (4 * number of MPI processes).
! The data partitioning pattern is a column-wise partitioning across all
! processes. Each process writes a subarray of size nx * ny.
! Note the description above follows the Fortran array index order.
!
! Example commands for MPI run and outputs from running ncmpidump on the
! NC file produced by this example program:
!
! % mpif90 -O2 -o put_var put_var.f90 -lpnetcdf
! % mpiexec -n 4 ./put_var /pvfs2/wkliao/testfile.nc
!
! % ncmpidump /pvfs2/wkliao/testfile.nc
! netcdf testfile {
! // file format: CDF-5 (big variables)
! dimensions:
! x = 5 ;
! y = 16 ;
! variables:
! int var(y, x) ;
! data:
!
! var =
! 0, 0, 0, 0, 0,
! 0, 0, 0, 0, 0,
! 0, 0, 0, 0, 0,
! 0, 0, 0, 0, 0,
! 1, 1, 1, 1, 1,
! 1, 1, 1, 1, 1,
! 1, 1, 1, 1, 1,
! 1, 1, 1, 1, 1,
! 2, 2, 2, 2, 2,
! 2, 2, 2, 2, 2,
! 2, 2, 2, 2, 2,
! 2, 2, 2, 2, 2,
! 3, 3, 3, 3, 3,
! 3, 3, 3, 3, 3,
! 3, 3, 3, 3, 3,
! 3, 3, 3, 3, 3 ;
! }
!
subroutine check(err, message)
use mpi
use pnetcdf
implicit none
integer err
character(len=*) message
! It is a good idea to check returned value for possible error
if (err .NE. NF90_NOERR) then
write(6,*) trim(message), trim(nf90mpi_strerror(err))
call MPI_Abort(MPI_COMM_WORLD, -1, err)
end if
end subroutine check
program main
use mpi
use pnetcdf
implicit none
integer,parameter :: R8 = selected_real_kind(12) ! 8 byte real
integer,parameter :: R4 = selected_real_kind( 6) ! 4 byte real
character(LEN=256) filename, cmd
integer err, nprocs, rank, ierr, get_args, dummy
integer cmode, ncid, varid, dimid(2), varid2
integer(kind=MPI_OFFSET_KIND) nx, ny, global_nx, global_ny
integer(kind=MPI_OFFSET_KIND) starts(2), counts(2)
integer(kind=MPI_OFFSET_KIND) malloc_size, sum_size
PARAMETER(nx=5, ny=4)
real(kind=R8):: buf(nx,ny)
logical verbose
call MPI_Init(err)
call MPI_Comm_rank(MPI_COMM_WORLD, rank, err)
call MPI_Comm_size(MPI_COMM_WORLD, nprocs, err)
! take filename from command-line argument if there is any
if (rank .EQ. 0) then
verbose = .TRUE.
filename = "testfile.nc"
ierr = get_args(2, cmd, filename, verbose, dummy)
endif
call MPI_Bcast(ierr, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, err)
if (ierr .EQ. 0) goto 999
call MPI_Bcast(verbose, 1, MPI_LOGICAL, 0, MPI_COMM_WORLD, err)
call MPI_Bcast(filename, 256, MPI_CHARACTER, 0, MPI_COMM_WORLD, err)
! set parameters
global_nx = nx
global_ny = ny * nprocs
buf = real(rank,kind=R8);
! create file, truncate it if exists
cmode = IOR(NF90_CLOBBER, NF90_64BIT_DATA)
err = nf90mpi_create(MPI_COMM_WORLD, filename, cmode, &
MPI_INFO_NULL, ncid)
call check(err, 'In nf90mpi_create: ')
! define dimensions x and y
err = nf90mpi_def_dim(ncid, "x", global_nx, dimid(1))
call check(err, 'In nf90mpi_def_dim x: ')
err = nf90mpi_def_dim(ncid, "y", global_ny, dimid(2))
call check(err, 'In nf90mpi_def_dim y: ')
! define a 2D variable of double type
err = nf90mpi_def_var(ncid, "var", NF90_DOUBLE, dimid, varid)
call check(err, 'In nf90mpi_def_var: ')
! define a 2D variable of real type
err = nf90mpi_def_var(ncid, "var2", NF90_REAL, dimid, varid2)
call check(err, 'In nf90mpi_def_var: ')
! do not forget to exit define mode
err = nf90mpi_enddef(ncid)
call check(err, 'In nf90mpi_enddef: ')
! now we are in data mode
! Note that in Fortran, array indices start with 1
starts(1) = 1
starts(2) = ny * rank + 1
counts(1) = nx
counts(2) = ny
err = nf90mpi_put_var_all(ncid, varid, buf, starts, counts)
call check(err, 'In nf90mpi_put_var_all: ')
err = nf90mpi_put_var_all(ncid, varid2, buf, starts, counts)
call check(err, 'In nf90mpi_put_var_all: ')
! close the file
err = nf90mpi_close(ncid)
call check(err, 'In nf90mpi_close: ')
! check if there is any PnetCDF internal malloc residue
998 format(A,I13,A)
err = nf90mpi_inq_malloc_size(malloc_size)
if (err == NF90_NOERR) then
call MPI_Reduce(malloc_size, sum_size, 1, MPI_INTEGER8, &
MPI_SUM, 0, MPI_COMM_WORLD, err)
if (rank .EQ. 0 .AND. sum_size .GT. 0_MPI_OFFSET_KIND) print 998, &
'heap memory allocated by PnetCDF internally has ', &
sum_size/1048576, ' MiB yet to be freed'
endif
999 call MPI_Finalize(err)
! call EXIT(0) ! EXIT() is a GNU extension
end program main
The resulting testfile.nc is
netcdf testfile {
dimensions:
x = 5 ;
y = 16 ;
variables:
double var(y, x) ;
float var2(y, x) ;
data:
var =
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
0, 0, 0, 0, 0,
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
1, 1, 1, 1, 1,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2,
2, 2, 2, 2, 2,
3, 3, 3, 3, 3,
3, 3, 3, 3, 3,
3, 3, 3, 3, 3,
3, 3, 3, 3, 3 ;
var2 =
-1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
-1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
-1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
-1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16, -1.251217e+16,
-3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
-3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
-3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
-3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10, -3.695857e+10,
-6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
-6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
-6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
-6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07, -6.574138e+07,
-28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
-28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
-28308.59, -28308.59, -28308.59, -28308.59, -28308.59,
-28308.59, -28308.59, -28308.59, -28308.59, -28308.59 ;
}
Hi, @jedwards4b Does the same happen to C program?
Yes, it also appears to happen with a C program, here is a test:
#include <stdio.h>
#include <stdlib.h>
#include <string.h> /* strcpy(), strncpy() */
#include <unistd.h> /* getopt() */
#include <time.h> /* time() localtime(), asctime() */
#include <mpi.h>
#include <pnetcdf.h>
#define NY 10
#define NX 4
static int verbose;
#define ERR {if(err!=NC_NOERR){printf("Error at %s:%d : %s\n", __FILE__,__LINE__, ncmpi_strerror(err));nerrs++;}}
static void
usage(char *argv0)
{
char *help =
"Usage: %s [-h] | [-q] [-k format] [file_name]\n"
" [-h] Print help\n"
" [-q] Quiet mode (reports when fail)\n"
" [-k format] file format: 1 for CDF-1, 2 for CDF-2, 3 for NetCDF4,\n"
" 4 for NetCDF4 classic model, 5 for CDF-5\n"
" [filename] output netCDF file name\n";
fprintf(stderr, help, argv0);
}
/*----< pnetcdf_check_mem_usage() >------------------------------------------*/
/* check PnetCDF library internal memory usage */
static int
pnetcdf_check_mem_usage(MPI_Comm comm)
{
int err, nerrs=0, rank;
MPI_Offset malloc_size, sum_size;
MPI_Comm_rank(comm, &rank);
/* print info about PnetCDF internal malloc usage */
err = ncmpi_inq_malloc_max_size(&malloc_size);
if (err == NC_NOERR) {
MPI_Reduce(&malloc_size, &sum_size, 1, MPI_OFFSET, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0 && verbose)
printf("maximum heap memory allocated by PnetCDF internally is %lld bytes\n",
sum_size);
/* check if there is any PnetCDF internal malloc residue */
err = ncmpi_inq_malloc_size(&malloc_size);
MPI_Reduce(&malloc_size, &sum_size, 1, MPI_OFFSET, MPI_SUM, 0, MPI_COMM_WORLD);
if (rank == 0 && sum_size > 0)
printf("heap memory allocated by PnetCDF internally has %lld bytes yet to be freed\n",
sum_size);
}
else {
printf("Error at %s:%d: %s\n", __FILE__,__LINE__,ncmpi_strerror(err));
nerrs++;
}
return nerrs;
}
/*----< pnetcdf_io() >-------------------------------------------------------*/
static int
pnetcdf_io(MPI_Comm comm, char *filename, int cmode)
{
int i, j, rank, nprocs, err, nerrs=0;
int ncid, varid, dimid[2];
double buf[NY][NX];
char str_att[128];
float float_att[100];
MPI_Offset global_ny, global_nx;
MPI_Offset start[2], count[2];
MPI_Comm_rank(comm, &rank);
MPI_Comm_size(comm, &nprocs);
/* create a new file for writing ----------------------------------------*/
cmode |= NC_CLOBBER;
err = ncmpi_create(comm, filename, cmode, MPI_INFO_NULL, &ncid); ERR
/* the global array is NY * (NX * nprocs) */
global_ny = NY;
global_nx = NX * nprocs;
for (i=0; i<NY; i++)
for (j=0; j<NX; j++)
buf[i][j] = (float) rank;
/* add a global attribute: a time stamp at rank 0 */
time_t ltime = time(NULL); /* get the current calendar time */
asctime_r(localtime(<ime), str_att);
sprintf(str_att, "Mon Aug 13 21:27:48 2018");
/* make sure the time string are consistent among all processes */
MPI_Bcast(str_att, sizeof(str_att), MPI_CHAR, 0, comm);
err = ncmpi_put_att_text(ncid, NC_GLOBAL, "history", strlen(str_att),
&str_att[0]); ERR
/* define dimensions x and y */
err = ncmpi_def_dim(ncid, "Y", global_ny, &dimid[0]); ERR
err = ncmpi_def_dim(ncid, "X", global_nx, &dimid[1]); ERR
/* define a 2D variable of integer type */
err = ncmpi_def_var(ncid, "var", NC_FLOAT, 2, dimid, &varid); ERR
/* add attributes to the variable */
strcpy(str_att, "example attribute of type text.");
err = ncmpi_put_att_text(ncid, varid, "str_att_name", strlen(str_att),
&str_att[0]); ERR
for (i=0; i<8; i++) float_att[i] = i;
err = ncmpi_put_att_float(ncid, varid, "float_att_name", NC_FLOAT, 8,
&float_att[0]); ERR
short short_att=1000;
err = ncmpi_put_att_short(ncid, varid, "short_att_name", NC_SHORT, 1,
&short_att); ERR
/* do not forget to exit define mode */
err = ncmpi_enddef(ncid); ERR
/* now we are in data mode */
start[0] = 0;
start[1] = NX * rank;
count[0] = NY;
count[1] = NX;
err = ncmpi_put_vara_all(ncid, varid, start, count, &buf[0][0], NY*NX, MPI_DOUBLE); ERR
err = ncmpi_close(ncid); ERR
return nerrs;
}
int main(int argc, char** argv)
{
extern int optind;
char filename[256];
int i, rank, kind=0, cmode=0, nerrs=0;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
verbose = 1;
/* get command-line arguments */
while ((i = getopt(argc, argv, "hqk:")) != EOF)
switch(i) {
case 'q': verbose = 0;
break;
case 'k': kind = atoi(optarg);
break;
case 'h':
default: if (rank==0) usage(argv[0]);
MPI_Finalize();
return 1;
}
if (argv[optind] == NULL) strcpy(filename, "testfile.nc");
else snprintf(filename, 256, "%s", argv[optind]);
MPI_Bcast(filename, 256, MPI_CHAR, 0, MPI_COMM_WORLD);
if (verbose && rank == 0) printf("%s: example of using put_vara APIs\n",__FILE__);
switch (kind) {
case(2): cmode = NC_64BIT_OFFSET; break;
case(3): cmode = NC_NETCDF4; break;
case(4): cmode = NC_NETCDF4|NC_CLASSIC_MODEL; break;
case(5): cmode = NC_64BIT_DATA; break;
default: cmode = 0;
}
#ifndef PNETCDF_DRIVER_NETCDF4
/* netcdf4 driver is not enabled, skip */
if (kind == 3 || kind == 4) {
MPI_Finalize();
return 0;
}
#endif
nerrs += pnetcdf_io(MPI_COMM_WORLD, filename, cmode);
nerrs += pnetcdf_check_mem_usage(MPI_COMM_WORLD);
MPI_Finalize();
return (nerrs > 0);
}
It seems to be an xl compiler issue. The same test using pgi is okay. (on the same machine and using the same MPI library)
I opened a defect against XL for this issue. For followup purposes with the XL beta liaison for your organization, this is defect 158051 in our defect tracking system.
Hi @jedwards4b You mentioned earlier that the issue does not happen to serial netcdf. I am wondering if this is still the case.
HI @wkliao the IBM folks tell me that a workaround is to compile without optimization. It could be that the netcdf build was without -O and the pnetcdf c build had -O. I will look into that tomorrow and verify.
The file that's miscompiled is ncx.lo. So you can limit the workaround to this file by adding the following to your local version of the build tree: In src/drivers/common/Makefile, add:
ncx.lo: LTCOMPILE += -qnoopt
before the ".c.lo:" recipe.
Another alternative is to use -qnoinline instead of -qnoopt in your local copy of src/drivers/common/Makefile
ncx.lo: LTCOMPILE += -qnoinline
We believe this is a source issue and not a compiler bug. It appears that ncx.c breaks strict ANSI C aliasing rules in regards to how put_ix_float / get_ix_float are used. That's why the file was miscompiled. The file should either be changed to follow strict ANSI C aliasing, or -fno-strict-aliasing should be used to compile it. e.g. ncx.lo: LTCOMPILE += -fno-strict-aliasing
I recall this strict aliasing error happens before to some xlc compilers. At that time, I added "-fno-inline" to work around. Will this and removing all inline directive from the source codes resolve the problem?
Our recommendation is to add -fno-strict-aliasing for this file. You can keep the inline directives. -fno-strict-aliasing is a big hammer, but it will do the right thing. Removing inlining might work with our current optimizer, but future ones might try more aggressive optimizations that will make the problem appear again.
Adding the flag below solves the problem.
ncx.lo: LTCOMPILE += -qnoinline
On Wed, Oct 24, 2018 at 8:55 AM Rafik Zurob [email protected] wrote:
Our recommendation is to add -fno-strict-aliasing for this file. You can keep the inline directives. -fno-strict-aliasing is a big hammer, but it will do the right thing. Removing inlining might work with our current optimizer, but future ones might try more aggressive optimizations that will make the problem appear again.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Parallel-NetCDF/PnetCDF/issues/23#issuecomment-432691794, or mute the thread https://github.com/notifications/unsubscribe-auth/AF16GD_8f0kyIMsjGJMFELUR8HtavFuxks5uoH9QgaJpZM4Xznu7 .
-- Jim Edwards
CESM Software Engineer National Center for Atmospheric Research Boulder, CO
Hi @rzurob and @jedwards4b May I assume you are testing this on a Big Endian machine? The source codes of put_ix_float in question are shown below. As you can see, memcpy() is called if the machine is big endian. I can understand the strict aliasing error on little endian machine because of the byte swap operation in swap4b, but not memcpy. Can you verify?
inline static void
put_ix_float(void *xp, const float *ip)
{
#ifdef WORDS_BIGENDIAN
(void) memcpy(xp, ip, X_SIZEOF_FLOAT);
#else
swap4b(xp, ip);
#endif
}
It's a little endian
On Wed, Oct 24, 2018 at 12:00 PM Wei-keng Liao [email protected] wrote:
Hi @rzurob https://github.com/rzurob and @jedwards4b https://github.com/jedwards4b May I assume you are testing this on a Big Endian machine? The source codes of put_ix_float in question are shown below. As you can see, memcpy() is called if the machine is big endian. I can understand the strict aliasing error on little endian machine because of the byte swap operation in swap4b, but not memcpy. Can you verify?
inline static void put_ix_float(void *xp, const float *ip) { #ifdef WORDS_BIGENDIAN (void) memcpy(xp, ip, X_SIZEOF_FLOAT); #else swap4b(xp, ip); #endif }
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Parallel-NetCDF/PnetCDF/issues/23#issuecomment-432765693, or mute the thread https://github.com/notifications/unsubscribe-auth/AF16GI--FDmrMARzZLiyEQArSEc3YjjNks5uoKrOgaJpZM4Xznu7 .
-- Jim Edwards
CESM Software Engineer National Center for Atmospheric Research Boulder, CO
OK. That makes sense. I committed a fix (#24) to the master branch using the below solution suggested by @rzurob (thanks!). In addition, I also removed all the inline directives. Let me know if this works for you.
ncx.lo: LTCOMPILE += -fno-strict-aliasing
It works for the ibm compiler, however now when I try to use the pgi compiler on the same system I am still getting that -fno-strict-aliasing flag.
Hi @jedwards4b Please send me your config.log file. FYI The latest configure.ac and Makefile.am checks whether PGI based compilers are used and selects the compile flag understood by PGI. https://github.com/Parallel-NetCDF/PnetCDF/blob/3999f6e31d927c8b1dd771bbe7a3b948640a9ae6/src/drivers/common/Makefile.am#L51-L53
checking if C compiler is pgcc... no
mpicc --version
pgcc 18.7-0 linuxpower target on Linuxpower PGI Compilers and Tools Copyright (c) 2018, NVIDIA CORPORATION. All rights reserved.
In my script, I use "-V". Could you run "mpicc -V" and let me know?
I see the problem - in acinclude.m4:
AC_DEFUN([UD_CHECK_PGCC],[
AC_CACHE_CHECK([if C compiler is pgcc], [ac_cv_cc_compiler_pgcc],
[ac_cv_cc_compiler_pgcc=no
_CC_VER=`$MPICC -V -c 2> /dev/null`
_CC_VENDOR=`echo $_PGCC_VER | cut -d' ' -f1`
if test "x${_CC_VENDOR}" = xpgcc ; then
ac_cv_cc_compiler_pgcc=yes
fi
unset _CC_VER
unset _CC_VENDOR
])
])
Shouldn't _PGCC_VER be _CC_VER?
Yes, changing _PGCC_VER to _CC_VER above solves the problem.
checking if C compiler is pgcc... yes
You forgot to change _PGCC_VER in setting _CC_VENDOR.
This is strange, because _PGCC_VER and _PGCC_VENDOR are used locally. Maybe the names conflict with one used by PGI environment?
Can you reverse your change and run autoreconf and your configure command again?
I posted the original before - here is the corrected code:
AC_DEFUN([UD_CHECK_PGCC],[
AC_CACHE_CHECK([if C compiler is pgcc], [ac_cv_cc_compiler_pgcc],
[ac_cv_cc_compiler_pgcc=no
_CC_VER=`$MPICC -V -c 2> /dev/null`
_CC_VENDOR=`echo $_CC_VER | cut -d' ' -f1`
if test "x${_CC_VENDOR}" = xpgcc ; then
ac_cv_cc_compiler_pgcc=yes
fi
unset _CC_VER
unset _CC_VENDOR
])
])
I don't see any reason to change it the other way but I have no objection.
I just realized that I have a local committed change that has not been pushed to the repo. Let me fix this and let you know.
I just pushed a new commit to acinclude.m4 to the master branch that should fixes the problem.
yes that works, thanks
On Mon, Nov 12, 2018 at 8:24 PM Wei-keng Liao [email protected] wrote:
I just pushed a new commit to acinclude.m4 to the master branch that should fixes the problem.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Parallel-NetCDF/PnetCDF/issues/23#issuecomment-438118909, or mute the thread https://github.com/notifications/unsubscribe-auth/AF16GK8JPDHedZI2SJhMTNcnqaAttBq9ks5uujtpgaJpZM4Xznu7 .
-- Jim Edwards
CESM Software Engineer National Center for Atmospheric Research Boulder, CO
Hi, @rzurob
It seems -fno-strict-aliasing is only recognizable by the XL compiler V13, but not V12.
In V12, is the corresponding one -qalias=noansi?
(not -qalias=norestrict)
Hello. Yes, you'll want -qalias=noansi for V12.