layout icon indicating copy to clipboard operation
layout copied to clipboard

fix g++ warnings and always_inline usage

Open joshua-henderson opened this issue 4 years ago • 13 comments

Fix several g++ warnings. Also address the use of the gcc always_inline attribute which misbehaves on at least ARM9.

joshua-henderson avatar Aug 27 '19 18:08 joshua-henderson

Good question. I have not tried clang, only various different versions of g++.

joshua-henderson avatar Nov 14 '19 18:11 joshua-henderson

I don't have any ARM9 hardware to test on, unfortunately. But maybe you can do it.

I've created a new branch named test. It has some fixes, changes, and adds an automation script to make it easier to run the tests on Linux/POSIX. You can edit the bash script file named tool and near the bottom, change the name of the compilers in the list of config cases (look for configs=).

(I currently have it hard-coded to the name of the compilers in my test VM, like clang-7. You'll want to change it to whatever you have.)

Once you've set it up, try running:

$ ./tool battery

And it should build and run the battery of tests program for those configs.

I've also fixed some of the warnings (in a slightly different fashion, with C++98 compat) that your changes fix.

Let me know what the results are.

randrew avatar Nov 14 '19 19:11 randrew

Good news - the test picks up the failure on ARM9.

Because of cross compiling, it was quicker to skip using tools to compile and run. I just show the command line instead.

$ arm-buildroot-linux-gnueabi-g++ --version arm-buildroot-linux-gnueabi-g++.br_real (Buildroot 2019.05) 7.4.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

test branch (e9bc8f2c72):

$ arm-buildroot-linux-gnueabi-g++ test_layout.c -DDEBUG -g -o test_layout

# ./test_layout
Running tests
 * testing_sanity
 * simple_fill
 * reserve_capacity
 * multiple_uninserted
 * column_even_fill
 * row_even_fill
 * fixed_and_fill
 * simple_margins_1
 * nested_boxes_1
    Iteration #1
Failed test at line 430 in test_nested_boxes_1
Aborted

test branch (e9bc8f2c72) + cherry-pick 7c37742490:

$ arm-buildroot-linux-gnueabi-g++ test_layout.c -DDEBUG -g -o test_layout2

# ./test_layout2
Running tests
 * testing_sanity
 * simple_fill
 * reserve_capacity
 * multiple_uninserted
 * column_even_fill
 * row_even_fill
 * fixed_and_fill
 * simple_margins_1
 * nested_boxes_1
    Iteration #1
    Iteration #2
    Iteration #3
    Iteration #4
    Iteration #5
 * deep_nest_1
 * many_children_1
 * child_align_1
 * child_align_2
 * wrap_row_1
 * wrap_row_2
 * wrap_row_3
 * wrap_row_4
 * wrap_row_5
 * wrap_column_1
 * wrap_column_2
 * wrap_column_3
 * wrap_column_4
Finished tests

So, in summary the only difference is commit 7c37742490. I think this pretty clearly isolates the issue to a side effect of using always_inline. Note that I do not see this issue on other ARM cores like Cortex-A5 or even on x86_64.

It's worth noting if I load up the good case and the bad case in gdb and break at test_layout:430 - all local variables are the exact same value! Except, if I step then only one of the conditions fails. They are operating on the exact same data (or so it appears), yet one of the comparisons fails in LTEST_VEC4EQ().

I looked into this a bit more. Above, in both cases I compiled without any gcc optimization.

If I instead turn on seemingly any gcc optimization level (-01, -02, -03) in the failure case, it now magically works just fine. Which, takes me to the gcc definition of the attribute (https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html) for a clue.

always_inline Generally, functions are not inlined unless optimization is specified. For functions declared inline, this attribute inlines the function independent of any restrictions that otherwise apply to inlining. Failure to inline such a function is diagnosed as an error. Note that if such a function is called indirectly the compiler may or may not inline it depending on optimization level and a failure to inline an indirect call may or may not be diagnosed.

OK, so what happens if I take the inline off.

$ git diff
diff --git a/layout.h b/layout.h
index 653a5c3..766294f 100644
--- a/layout.h
+++ b/layout.h
@@ -31,7 +31,7 @@
 // 'static inline' for things we always want inlined -- the compiler should not
 // even have to consider not inlining these.
 #if defined(__GNUC__) || defined(__clang__)
-#define LAY_STATIC_INLINE __attribute__((always_inline)) static inline
+#define LAY_STATIC_INLINE __attribute__((always_inline)) static
 #elif defined(_MSC_VER)
 #define LAY_STATIC_INLINE __forceinline static
 #else
@@ -468,7 +468,7 @@ LAY_STATIC_INLINE void lay_get_rect_xywh(
 #endif
 
 #if defined(__GNUC__) || defined(__clang__)
-#define LAY_FORCE_INLINE __attribute__((always_inline)) inline
+#define LAY_FORCE_INLINE __attribute__((always_inline))
 #ifdef __cplusplus
 #define LAY_RESTRICT __restrict
 #else
$ arm-buildroot-linux-gnueabi-g++ test_layout.c -DDEBUG -g -o test_layout
In file included from test_layout.c:3:0:
layout.h:1124:12: warning: always_inline function might not be inlinable [-Wattributes]
 lay_scalar lay_arrange_wrapped_overlay_squeezed(
            ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
layout.h:1088:6: warning: always_inline function might not be inlinable [-Wattributes]
 void lay_arrange_overlay_squeezed_range(
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
layout.h:1052:6: warning: always_inline function might not be inlinable [-Wattributes]
 void lay_arrange_overlay(lay_context *ctx, lay_id item, int dim)
      ^~~~~~~~~~~~~~~~~~~
layout.h:915:6: warning: always_inline function might not be inlinable [-Wattributes]
 void lay_arrange_stacked(
      ^~~~~~~~~~~~~~~~~~~
... on and on

Maybe always_inline is indeed inlining even when gcc would not normally be choosing to do so safely. I didn't go as far as looking at the generated code difference, but I suspect that would reveal the real problem. Maybe an alternate fix is to just remove always_inline when optimization is off? Using always_inline still seems risky here in any event.

joshua-henderson avatar Nov 14 '19 21:11 joshua-henderson

Wow, thanks for the in-depth report! That's pretty interesting. I'm really surprised GCC would continue to inline in an unsafe condition -- it's not supposed to do that.

There might be a bug in the layout code that isn't being shown by warnings or tests.

Can you try it with undefined behavior and address sanitizer enabled? My tool script enables those, but you can do it manually. Add -fsanitize=address -fsanitize=undefined to your gcc flags. You'll need gcc 6.0.0 or clang 3.9.0 or later.

randrew avatar Nov 14 '19 21:11 randrew

On test branch as is.

$ arm-buildroot-linux-gnueabi-g++ test_layout.c -DDEBUG -g -fsanitize=address -o test_layout
# ./test_layout
Running tests
 * testing_sanity
 * simple_fill
 * reserve_capacity
 * multiple_uninserted
 * column_even_fill
 * row_even_fill
 * fixed_and_fill
 * simple_margins_1
 * nested_boxes_1
    Iteration #1
    Iteration #2
    Iteration #3
    Iteration #4
    Iteration #5
 * deep_nest_1
 * many_children_1
 * child_align_1
 * child_align_2
 * wrap_row_1
 * wrap_row_2
 * wrap_row_3
 * wrap_row_4
 * wrap_row_5
 * wrap_column_1
 * wrap_column_2
 * wrap_column_3
 * wrap_column_4
Finished tests
$ arm-buildroot-linux-gnueabi-g++ test_layout.c -DDEBUG -g -fsanitize=undefined -o test_layout
# ./test_layout
Running tests
 * testing_sanity
 * simple_fill
 * reserve_capacity
 * multiple_uninserted
 * column_even_fill
 * row_even_fill
 * fixed_and_fill
 * simple_margins_1
 * nested_boxes_1
    Iteration #1
Failed test at line 430 in test_nested_boxes_1
Aborted

And adding both flags at the same time is also a pass with no errors. So, I think this instrumentation is masking the problem.

joshua-henderson avatar Nov 14 '19 22:11 joshua-henderson

OK, that's kind of interesting. I'm trying to guess at what may be going on.

Does your build environment have stuff like stack protection or fortify enabled by default? I'm wondering if doing stuff like enabling address sanitizer or other options is causing the stack frame for that code to change alignment or have it shuffled around, thus exposing or masking the problem depending on what options are enabled or disabled.

On most Linux build environments you can disable stack protection (which may or may not be on by default) with something like -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0 -fno-stack-protector

And you can turn it on explicitly (or make it stronger) with -D_FORTIFY_SOURCE=2 -fstack-protector-strong

Another thing to try is to disable PIE/PIC: -no-pie -fno-pie Or to enable it: -pie -fpie -Wl,-pie

randrew avatar Nov 14 '19 22:11 randrew

I went ahead and root caused it at this point.

In an effort to try to isolate the problem, I got it down to a single function. With everything else inlined as usual except this function, the problem goes away.

static /*LAY_FORCE_INLINE*/
lay_scalar lay_calc_stacked_size(
        lay_context *ctx, lay_id item, int dim)
{
    const int wdim = dim + 2;
    lay_item_t *LAY_RESTRICT pitem = lay_get_item(ctx, item);
    lay_scalar need_size = 0;
    lay_id child = pitem->first_child;
    while (child != LAY_INVALID_ID) {
        lay_item_t *pchild = lay_get_item(ctx, child);
        lay_vec4 rect = ctx->rects[child];
        need_size += rect[dim] + rect[2 + dim] + pchild->margins[wdim];
        child = pchild->next_sibling;
    }
    return need_size;
}

If I put the inline back the assembly for

lay_vec4 rect = ctx->rects[child];

is

        ldr     r3, [fp, #-40]
        ldr     r2, [r3, #4]
        ldr     r3, [fp, #-72]
        lsl     r3, r3, #3
        add     r3, r2, r3
        ldrh    r2, [r3]
        ldrh    ip, [r3, #2]
        lsl     ip, ip, #16
        orr     r2, ip, r2
        mov     r0, r2
        ldrh    r2, [r3, #4]
        ldrh    r3, [r3, #6]
        lsl     r3, r3, #16
        orr     r3, r3, r2
        mov     r1, r3
        sub     r3, fp, #468
        strd    r0, [r3, #2]

Now, as a nifty side effect, if I put volatile on the child variable and make no other change, I instead get this assembly (and it masks the problem just like removing always_inline masks the problem and instrumenting the code masks the problem).

        ldr     r3, [fp, #-40]
        ldr     r2, [r3, #4]
        ldr     r3, [fp, #-460]
        lsl     r3, r3, #3
        add     r3, r2, r3
        ldrh    r2, [r3]
        ldrh    ip, [r3, #2]
        lsl     ip, ip, #16
        orr     r2, ip, r2
        mov     r0, r2
        ldrh    r2, [r3, #4]
        ldrh    r3, [r3, #6]
        lsl     r3, r3, #16
        orr     r3, r3, r2
        mov     r1, r3
        sub     r3, fp, #468
        strd    r0, [r3]

If you notice in the first version without volatile (original code which fails), that strd at the end with an offset of 2, is suspicious. Moving into a doubleword from offset 2? That 2 offset, comes directly from the aligned value on the vector type.

typedef int16_t lay_vec4 __attribute__ ((__vector_size__ (8), aligned(2)));

My take on what's happening here is the same thing that used to happen with packed on structs often. You get unaligned values and gcc does not handle it properly. The use of vector here is doing the same thing. It's 2 byte aligned and gcc isn't handling it. What gcc would know how to handle correctly is what you did for WIN32.

struct lay_vec4 {
    lay_scalar xyzw[4];
    const lay_scalar& operator[](int index) const
    { return xyzw[index]; }
    lay_scalar& operator[](int index)
    { return xyzw[index]; }
};

Or, take the alignment off the lay_vec4. At this point removing always_inline or adding in volatile are not fixing the real problem. They are just covering up the alignment issue for the test cases.

joshua-henderson avatar Nov 15 '19 22:11 joshua-henderson

Great job! That must be it.

I think the aligned attribute is not doing what it's supposed to be doing here for gcc:

This attribute specifies a minimum alignment (in bytes) for variables of the specified type. ... The aligned attribute can only increase the alignment; but you can decrease it by specifying packed as well.

This seems like a bug in gcc.

Either way, layout is no longer taking advantage of alignment for SIMD purposes, so these alignment specifiers are superfluous. I'll remove them and probably make some other changes.

But maybe this is something that should be reported? I could also be misunderstanding what this specifier is supposed to do, but it doesn't seem like you should be able to use it to get the compiler to generate bad code in this case. If int16_t requires more than 2 bytes of alignment then this specifier shouldn't be lowering it. (Which is why it's superfluous right now, anyway.)

randrew avatar Nov 15 '19 23:11 randrew

Sorry, another question:

Does the misaligned access occur if -DLAY_FLOAT is specified? (This causes float coordinates to be used instead of int16_t.)

I'm trying to find a way to easily remove the alignment specifier without having the compiler insert extra padding (which shouldn't be necessary for int16_t if the size and margins fields have their order swapped, but with float coordinates this won't work and there will still be padding inserted.)

randrew avatar Nov 16 '19 21:11 randrew

No it does not fail with float. It also does not fail if:

typedef int16_t lay_vec4 __attribute__ ((__vector_size__ (8), aligned(4)));

joshua-henderson avatar Nov 18 '19 23:11 joshua-henderson

Thanks. Sorry I haven't finished this yet -- I'm traveling right now and don't have access to my normal testing environment (a bunch of VMs.)

randrew avatar Nov 29 '19 07:11 randrew

I'm back from traveling. I have a way to work around this by reordering two fields and changing the alignment requirement, and it shouldn't add any extra padding. But the memory layout will be different, so hopefully nobody was relying on that not changing.

randrew avatar Dec 19 '19 11:12 randrew

nice

mgood7123 avatar Mar 25 '23 16:03 mgood7123