au icon indicating copy to clipboard operation
au copied to clipboard

Compiler Error: Comparison of integer expressions of different signedness

Open StefanoD opened this issue 7 months ago • 5 comments

Hi, I wanted to upgrade to the latest 0.4.1 version. I installed the prebuilt au_all_units.hh and get following compile error:

../source/Common/Units/include/Units/au.h: In function ‘constexpr uintmax_t au::detail::find_prime_factor(uintmax_t)’:

[2025-04-30T11:22:07.560Z] ../source/Common/Units/include/Units/au.h:2883:19: error: comparison of integer expressions of different signedness: ‘int’ and ‘uintmax_t’ {aka ‘long unsigned int’} [-Werror=sign-compare]

[2025-04-30T11:22:07.560Z]  2883 |         if (p * p > n) {
[2025-04-30T11:22:07.560Z]       |             ~~~~~~^~~

This is a snippet of constexpr std::uintmax_t find_prime_factor(std::uintmax_t n)

It fails with different GCC versions like 9.3 and 12.3

My sub-issue is to enable warnings as errors in your CI: -Wall -Wextra -Werror

StefanoD avatar Apr 30 '25 11:04 StefanoD

I tried to reproduce, after git checkout 0.4.1, by the following method:

bazel test \
  --copt=-Wall \
  --copt=-Wextra \
  --copt=-Werror \
  --copt=-Wsign-compare \
  --config=gcc \
  //au/...:all \
  //release/...:all

However, all of the tests pass, including //release:au_all_units_hh_test.

My goal is to get a reproduction command so that we can make our CI stronger. (We already do enable warnings as errors in CI, and we include all of -Wall -Wextra -Werror as well.)

Maybe what this suggests is that our coverage for //release/... is lacking? Let's try this approach:

  1. Look at your code that led to this error.
  2. Try coming up with extra test cases that we can add to release/common_test_cases.hh.
  3. Run the reproduction command above and see if you can get it to fail.

Once we get the reproduction nailed down, we can fix the problem.

chiphogg avatar Apr 30 '25 13:04 chiphogg

Actually, I just thought to check the code itself. The error is surprising, but I think I understand what's going on.

The type of p is const uint16_t&, and the type of n is std::uintmax_t, so we should be comparing unsigned to unsigned, right? Well, p undergoes integer promotion to int before we perform *, because the arithmetic operators cannot operate directly on uint16_t. So, this known-unsigned expression gets automatically converted to a signed expression --- which nevertheless is known to hold an unsigned value --- but the signed expression comparison with the other unsigned value trips this compiler warning.

Phew! 😅

The real puzzler here is why I can't get this to reproduce. According to the rules of integer promotion, it feels like this should be happening all the time.

Let me know if you are able to come up with any good test cases --- I would love to get this fixed. And thanks for raising it in the first place!

chiphogg avatar Apr 30 '25 14:04 chiphogg

What me surprises is, that the compiler error is triggered somewhere in our project which doesn't directly relate to your units lib, but only indirectly by including the au header with many indirections. I tried to get a minimal example, but failed in the last 30-45 minutes and now I need to go home. 😅 Our units unit test are passing...

I guess you just need a unit test which uses this function directly in order to trigger the compile error?

StefanoD avatar Apr 30 '25 15:04 StefanoD

Weirdly, even a direct test doesn't seem to expose it!

TEST(CommonSingleFile, CanFindPrimeFactors) {
    EXPECT_EQ(detail::find_prime_factor(1'000'003u), 1'000'003u);
}

Here's a plan. First, we'll leave it open for a little while we try to figure out how to get the coverage we need. But if we can't reproduce it, then we'll have to settle for fixing this specific problem, which should be straightforward. I've connected it to the 0.4.2 milestone so that we'll at least get this fixed by then.

chiphogg avatar Apr 30 '25 15:04 chiphogg

I'm at home and I try to reproduce the compile error by entering numbers which enters this if-statement:

if (p * p > n) {
       return n;
}

You can trigger this branch with the numbers 63031 and 63059. But it still compiles.

Honestly, I think it might be a compiler bug that the comparison of integer expressions of different signedness is not detected correctly.

You can immediately trigger the compiler error by changing the code to:

const auto result = p*p;

if (result > n) {
       return n;
}

This should be in theory equivalent code, isn't it?

StefanoD avatar Apr 30 '25 19:04 StefanoD

Here's a very concise repro: https://godbolt.org/z/jz1oTh8xx

I'll ask around and see if I can get some insights.

chiphogg avatar Apr 30 '25 21:04 chiphogg

(To be clear: I meant a repro of the generic problem, where the signedness comes into play for the lvalue but not the prvalue. Still no repro for the Au bug!)

chiphogg avatar Apr 30 '25 21:04 chiphogg

Meanwhile, on the Au repro front: @StefanoD, do you have a precise compiler version and exact/complete list of compiler flags in your build that do show this error?

chiphogg avatar Apr 30 '25 22:04 chiphogg

@chiphogg I'm currently on vacation and return on Monday to my office again. But let me summarize:

  • We have GCC 9.3 and GCC 13.2 CI pipeline with the flags -Wall -Wextra -Werror
  • The compiler error does not show up in our core lib which uses the Au lib
  • Our core lib has many unit tests and they compile without problems
  • We don't use PrimeFactorization in our entire code base
  • The compiler error shows up in a specific project with uses our core lib. It even doesn't directly use the Au lib.
  • And: In your minimal Godbolt example square_less_than_method_A() and square_less_than_method_B produce the exact same assembly code! - So there must be a bug in GCC where the compiler error is triggered only in rare edge cases

StefanoD avatar May 01 '25 07:05 StefanoD

Thanks for the update! Enjoy your vacation. 😎 🌴 Meanwhile, I have some comments to share.

  • We don't use PrimeFactorization in our entire code base

You probably don't have any direct uses, but it would be really hard to avoid using it at least indirectly. I think it might be possible if you only use base SI units with no prefixes, but this is likely getting called under the hood all the time.

  • And: In your minimal Godbolt example square_less_than_method_A() and square_less_than_method_B produce the exact same assembly code! - So there must be a bug in GCC where the compiler error is triggered only in rare edge cases

This is actually exactly what I would expect (conditional on the code actually compiling, of course). The codegen should be the same in both cases. But, evidently, making a named variable makes the problem too hard to ignore, while using an inline expression (which is still comparing an int to a uint64_t!) is something that most compilers feel fine overlooking.


So: what to do? Well, I feel more and more comfortable that we do have good coverage for -Wsign-compare, and that this is just a very hard case to actually trigger. So this moves me in the direction of simply fixing this one instance, without changing CI. @StefanoD, when you get back, try running your builds on #398 so we can see whether there are any other weird issues. Once I hear back from you, I'll put this out for review.

chiphogg avatar May 01 '25 13:05 chiphogg

Thanks, this patch works. 🙂 Question: It would make sense to report the compiler bug to the GCC and clang guys. Did you already do this or shall I do this?

StefanoD avatar May 05 '25 08:05 StefanoD

I didn't do this. If you wanted to do this it'd be great, thanks!

chiphogg avatar May 05 '25 12:05 chiphogg

In case you are curious:

GCC bug report (closed as duplicate) LLVM bug report

StefanoD avatar May 06 '25 08:05 StefanoD

Thanks for sharing the links!

chiphogg avatar May 06 '25 12:05 chiphogg