dbus-broker icon indicating copy to clipboard operation
dbus-broker copied to clipboard

test_uds_edge fails on big endian architectures with kernel 5.15

Open bluca opened this issue 2 years ago • 11 comments

Since kernel 5.15 became available in Debian, test_uds_edge started failing on big endian manchines (ppc, ppc64, ia64, sparc64):

test-dispatch: ../src/util/test-dispatch.c:181: test_uds_edge: Assertion `c_assert_result && "r == sizeof(b)"' failed.
Aborted

recv() is returning -EINVAL here: https://github.com/bus1/dbus-broker/blob/main/src/util/test-dispatch.c#L181

Breakpoint 1, test_uds_edge (run=0) at ../src/util/test-dispatch.c:181
181	                c_assert(r == sizeof(b));
(gdb) p r
$2 = -1
(gdb) p errno
$1 = 22

https://buildd.debian.org/status/package.php?p=dbus-broker

I can reproduce this easily and have access to the affected hardware, but not sure what I am looking for.

bluca avatar Jan 18 '22 00:01 bluca

Catching up on things now. Is this still happening? The debian builds seem to be >3M old.

dvdhrm avatar Mar 30 '22 10:03 dvdhrm

I haven't done a new upload, but I assume yes - will double check

bluca avatar Mar 30 '22 10:03 bluca

@dvdhrm yup, still happens on an up-to-date debian unstable on ia64

bluca avatar Mar 30 '22 11:03 bluca

Tests are not failing anymore with v30 - @dvdhrm did something change that could have affected that?

https://buildd.debian.org/status/package.php?p=dbus-broker

bluca avatar May 10 '22 23:05 bluca

CFLAGS changed, but not in a meaningful way (I hope...). I very much assume this is a kernel issue and fixed due to a kernel update. The failing test you saw is a test we carry in dbus-broker only to verify a particular kernel behavior we rely on. It has no particular connection to dbus-broker, but we just wanted to make sure we have it around so we see when things break upstream.

I will keep monitoring this, but if this turns out to not reappear, I am happy to close the issue ;) I am almost done with the backlog, so I will have time to deal with this soon.

dvdhrm avatar May 11 '22 06:05 dvdhrm

Seen it again just now with v31 on sparc64:

https://buildd.debian.org/status/fetch.php?pkg=dbus-broker&arch=sparc64&ver=31-1&stamp=1652826419&raw=0

Kernel: Linux 5.15.0-2-sparc64-smp #1 SMP Debian 5.15.5-2 (2021-12-18) sparc64 (sparc64)
Toolchain package versions: binutils_2.38-4 dpkg-dev_1.21.7 g++-11_11.3.0-1 gcc-11_11.3.0-1 libc6-dev_2.33-7 libstdc++-11-dev_11.3.0-1 libstdc++6_12.1.0-2 linux-libc-dev_5.17.6-1+b1

bluca avatar May 17 '22 23:05 bluca

Ah, but this one is different! This time it fails dequeuing the message.

Edit: ah, no, I think I am wrong on this one.

dvdhrm avatar May 18 '22 08:05 dvdhrm

Btw., the initial problem was that recv() returned EINVAL, but only in the case where we drain the queue after a shutdown. I now found the upstream fix for this:

commit f9390b249c90a15a4d9e69fbfb7a53c860b1fcaf
Author: Vincent Whitchurch <[email protected]>
Date:   Fri Nov 19 13:05:21 2021 +0100

    af_unix: fix regression in read after shutdown

I don't know how I missed that fix the last time, maybe it was queued on some branch that I did not consult. I am quite certain net-next did not have that queued, yet. Anyway, this clearly fixes the problem you described initially.

The fix should be part of 5.16:

$ git describe f9390b249
v5.16-rc1-231-gf9390b249c90

Also, I think I was wrong in my previous assumption. The new report is again the same. Not sure why I considered it different, didn't remember exactly what the initial assertion was.

dvdhrm avatar May 18 '22 08:05 dvdhrm

Your newest report shows 5.15. I assume it does not have the fix backported, yet.

dvdhrm avatar May 18 '22 08:05 dvdhrm

Looks like that was backported to v5.15.9 so indeed it's not there yet. I have no control over the kernel of the build instances, so can't do much about it other than wait. But it's good news as it seems it will be solved soon.

commit 80d709875d920f7ca959040457b7393df706fe44
Author: Vincent Whitchurch <[email protected]>
Date:   Fri Nov 19 13:05:21 2021 +0100

    af_unix: fix regression in read after shutdown
    
    [ Upstream commit f9390b249c90a15a4d9e69fbfb7a53c860b1fcaf ]

bluca avatar May 18 '22 11:05 bluca

Perfect! I will leave this open until the problem no longer appears.

dvdhrm avatar May 18 '22 11:05 dvdhrm

I am closing this as solved. The upstream kernel fix is now backported to the stable trees.

Thanks a lot for the report!

dvdhrm avatar Aug 08 '23 07:08 dvdhrm