fmt icon indicating copy to clipboard operation
fmt copied to clipboard

Tests fail using NVIDIA HPC compilers (PGI compilers)

Open olupton opened this issue 3 years ago • 8 comments

The fmt test suite does not pass using the 22.3 NVIDIA HPC compilers. I tested this with today's master (682e097bee9aaa88e9c9826e72d87c2ec1b1226b) and confirmed I could reproduce using NVIDIA's own Docker images that ship their compilers:

$ docker run -it nvcr.io/nvidia/nvhpc:22.3-devel-cuda11.6-ubuntu20.04
root@411ece11247c:~# git clone https://github.com/fmtlib/fmt.git
Cloning into 'fmt'...
...
root@411ece11247c:~# cd fmt/
root@411ece11247c:~/fmt# mkdir build
root@411ece11247c:~/fmt# cd build
root@411ece11247c:~/fmt/build# cmake .. -DCMAKE_CXX_FLAGS="-tp haswell"
-- CMake version: 3.16.3
-- The CXX compiler identification is PGI 22.3.0
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.3/compilers/bin/nvc++
-- Check for working CXX compiler: /opt/nvidia/hpc_sdk/Linux_x86_64/22.3/compilers/bin/nvc++ — works
…
root@411ece11247c:~/fmt/build# make
Scanning dependencies of target fmt
[  1%] Building CXX object CMakeFiles/fmt.dir/src/format.cc.o
…
(there are some warnings here)
root@411ece11247c:~/fmt/build# make test
Running tests...
Test project /root/fmt/build
      Start  1: args-test
 1/20 Test  #1: args-test ........................   Passed    0.01 sec
…
80% tests passed, 4 tests failed out of 20

Total Test time (real) =   0.22 sec

The following tests FAILED:
	  3 - chrono-test (Failed)
	  7 - format-test (Failed)
	 12 - printf-test (Failed)
	 17 - xchar-test (Failed)
Errors while running CTest
make: *** [Makefile:97: test] Error 8

(I had to add -DCMAKE_CXX_FLAGS="-tp haswell" because these compilers use the equivalent of -march=native by default, but they seem to mis-detect the AVX512 capabilities of my Ice Lake laptop CPU and produce binaries that fail with illegal instruction errors. This might not be needed on other systems.)

I checked 22.3 first because this is the version we currently use in production. I also checked the latest version, 22.7, using the nvcr.io/nvidia/nvhpc:22.7-devel-cuda11.7-ubuntu22.04 Docker image and see the same four tests failing.

If I additionally pass -DCMAKE_BUILD_TYPE=Debug then the printf-test test passes but the other three still fail.

In the debug build the format_test.precision test appears to fail when handling very small/denormal double values, while chrono_test.special_durations and format_test.format_nan have some mismatches between inf and nan. xchar_test.escape_string ends up with an extra backslash.

In the default (non-debug) build there are additional failures in printf_test.pointer, xchar_test.format, xchar_test.format_utf8_precision, xchar_test.format_to, xchar_test.join, xchar_test.streamed, xchar_test.chrono, xchar_test.ostream, format_test.wide_format_to_n, chrono_test.time_point, locale_test.wformat and locale_test.chrono_weekday (list derived from the build with 22.7)

olupton avatar Aug 08 '22 12:08 olupton

I also see that NVHPC 22.3 does not play nicely with fmt in C++17 mode:

root@411ece11247c:~/fmt# cat foo.cpp
#define FMT_HEADER_ONLY
#include <fmt/format.h>
#include <iostream>
int main() {
  std::cout << fmt::format("{:.1f}", 1.0) << std::endl;
  return 0;
}
root@411ece11247c:~/fmt# nvc++ -Iinclude foo.cpp -o foo -std=c++17
root@411ece11247c:~/fmt# gdb foo
...
(gdb) r
...
Program received signal SIGABRT, Aborted.
0x00007f58fb79603b in raise () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f58fb79603b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f58fb775859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f58fcdb2911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007f58fcdbe38c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007f58fcdbe3f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x000000000041c288 in fmt::v9::detail::assert_fail () at /root/fmt/include/fmt/format-inl.h:40
#6  0x000000000041b36b in fmt::v9::detail::grisu_gen_digits () at /root/fmt/include/fmt/format.h:2613
#7  0x00000000004225a7 in fmt::v9::detail::format_float<double> () at /root/fmt/include/fmt/format.h:3139
#8  0x0000000000428665 in fmt::v9::detail::write<char, fmt::v9::appender, double, 0> () at /root/fmt/include/fmt/format.h:3219
#9  0x0000000000416e03 in fmt::v9::detail::arg_formatter<char>::operator()<double> () at /root/fmt/include/fmt/format.h:3383
#10 0x0000000000433285 in fmt::v9::visit_format_arg<fmt::v9::detail::arg_formatter<char>&, fmt::v9::basic_format_context<fmt::v9::appender, char> > () at /root/fmt/include/fmt/core.h:1646
#11 0x000000000043554e in fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref)::format_handler::on_format_specs () at /root/fmt/include/fmt/format.h:4121
#12 0x000000000043221a in fmt::v9::detail::parse_replacement_field<char, fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref)::format_handler&> ()
    at /root/fmt/include/fmt/core.h:2659
#13 0x0000000000431fd6 in fmt::v9::detail::parse_format_string<false, char, fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref)::format_handler> ()
    at /root/fmt/include/fmt/core.h:2684
#14 0x000000000041c5b1 in fmt::v9::detail::vformat_to<char> () at /root/fmt/include/fmt/format.h:4125
#15 0x0000000000432692 in fmt::v9::vformat[abi:cxx11] () at /root/fmt/include/fmt/format-inl.h:1472
#16 0x0000000000432837 in fmt::v9::format<double> () at /root/fmt/include/fmt/core.h:3199
#17 0x000000000040d4ce in main () at foo.cpp:5

but this issue goes away with 22.7, or if I pass -std=c++14 instead.

olupton avatar Aug 08 '22 13:08 olupton

Please post the actual test failures.

vitaut avatar Aug 08 '22 13:08 vitaut

Here you go: nvhpc22.7_debug.log, nvhpc22.3_debug.log, nvhpc22.7_nodebug.log, nvhpc22.3_nodebug.log.

As noted above, the differences between 22.3 and 22.7 do not seem to be significant here.

After https://github.com/fmtlib/fmt/issues/3028#issuecomment-1208145480 I tried to compile the fmt tests with -DCMAKE_CXX_STANDARD=17 and 22.3, but I get:

[ 14%] Building CXX object test/CMakeFiles/format-test.dir/format-test.cc.o
"/root/fmt/test/format-test.cc", line 2180: error: expression must have a constant value
    EXPECT_ERROR("{:{}}", "width/precision is not integer", int, double);
    ^
"/root/fmt/include/fmt/core.h", line 757: note: attempt to access expired storage
      if (arg_id < num_args_ && types_ && !is_integral_type(types_[arg_id]))
                                                                  ^
"/root/fmt/include/fmt/core.h", line 2272: note: called from:
      context_.check_dynamic_spec(arg_id);
                                 ^
"/root/fmt/include/fmt/core.h", line 2247: note: called from:
      specs_.width_ref = make_arg_ref(arg_id);
                                     ^
"/root/fmt/include/fmt/core.h", line 2445: note: called from:
      FMT_CONSTEXPR void operator()() { handler.on_dynamic_width(auto_id()); }
                                                                ^
"/root/fmt/include/fmt/core.h", line 2434: note: called from:
    handler();
           ^
"/root/fmt/include/fmt/core.h", line 2464: note: called from:
      if (begin != end) begin = parse_arg_id(begin, end, width_adapter{handler});
                                            ^
"/root/fmt/include/fmt/core.h", line 2605: note: called from:
    begin = parse_width(begin, end, handler);
                       ^
"/root/fmt/include/fmt/core.h", line 3037: note: called from:
      auto it = detail::parse_format_specs(begin, end, checker);
                                          ^
"/root/fmt/include/fmt/core.h", line 2742: note: called from:
    return f.parse(ctx);
                  ^
"/root/fmt/include/fmt/core.h", line 2977: note: called from:
      return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin;
                                                        ^
"/root/fmt/include/fmt/core.h", line 2659: note: called from:
        begin = handler.on_format_specs(adapter.arg_id, begin + 1, end);
                                       ^
"/root/fmt/include/fmt/core.h", line 2684: note: called from:
          begin = p = parse_replacement_field(p - 1, end, handler);
                                             ^
"/root/fmt/test/format-test.cc", line 2146: note: called from:
    fmt::detail::parse_format_string<true>(s, checker);
                                          ^

olupton avatar Aug 08 '22 15:08 olupton

FWIW a very crude workaround for the issue in https://github.com/fmtlib/fmt/issues/3028#issuecomment-1208145480 seems to be https://github.com/olupton/fmt/commit/203a4cea88b9da4eb2c1786d8a18438e24b427bd -- it seems that without this then data::pow10_exponents[index] evaluates to zero in nvhpc/22.3 C++17 mode. This hack is not thoroughly tested, but seems to be working in my use-case just now...

olupton avatar Aug 08 '22 16:08 olupton

@olupton NVIDIA HPC compilers uses EDG frontend or Clang-based frontend? Same issues exist when using MCST LCC compiler (EDG frontend).

Please check which section of the executable file contains the array data::pow10_exponents?

phprus avatar Aug 08 '22 17:08 phprus

Closing as it is clearly a compiler bug (please report to NVIDIA). A PR with a workaround would be welcome provided that it's not too intrusive.

vitaut avatar Aug 09 '22 16:08 vitaut

Up to you if you want to leave this closed, but: I agree the issue with pow10_exponents is clearly a compiler bug in 22.3 with c++17, but it seems that they have fixed it in 22.7.

The issues in the fmt test suite that I described in the original report (and I added the logs in the first line of https://github.com/fmtlib/fmt/issues/3028#issuecomment-1208303478) are still there in 22.7, so I think it's clear that the issue(s) there are at least a little different, and it's not clear (to me) that they are also the result of compiler bugs.

Apologies for mixing up these two aspects under the same issue.

olupton avatar Aug 09 '22 19:08 olupton

Reopening to track test failures.

vitaut avatar Aug 09 '22 21:08 vitaut

Fixed by #3043. Maybe close this issue?

phprus avatar Nov 13 '22 10:11 phprus