Test failures on s390x: endianness problems?
In addition to the hopefully easy 32-bit issue (#209), I'm wondering whether big endian systems might be entirely busted. That's hard to say for sure since s390x is the only one big endian system available on the https://ci.debian.net/ infrastructure…
I'll just quote one occurrence:
=== RUN TestConfigureNAT
nftables_test.go:313: message 2: got -- want
02 00 00 00 -- 02 00 00 00
! 00 08 00 01 -- 08 00 01 00
6e 61 74 00 -- 6e 61 74 00
! 00 08 00 02 -- 08 00 02 00
00 00 00 00 -- 00 00 00 00
--
In ! lines, see how byte pairs are swapped. There are many more occurrences of this issue.
Here's an apparently different issue, but maybe that's the same reason (with the length being misread due to swapped bytes):
=== RUN TestGetRules
nftables_test.go:831: message 0: got -- want
02 00 00 00 -- 02 00 00 00
! 00 0b 00 01 -- 0b 00 01 00
66 69 6c 74 -- 66 69 6c 74
65 72 00 00 -- 65 72 00 00
! 00 0a 00 02 -- 0a 00 02 00
69 6e 70 75 -- 69 6e 70 75
74 00 00 00 -- 74 00 00 00
--
nftables_test.go:853: invalid attribute; length too short or too large
--- FAIL: TestGetRules (0.00s)
and another one (might also be due to swapped bytes):
=== RUN TestListChains
nftables_test.go:1550: error returned from TestDial unexpected header type: got unknown(0), want unknown(2563)
--- FAIL: TestListChains (0.00s)
There's a final issue that looks like bytes are getting stashed in the wrong direction, and not just swapped:
=== RUN TestOtherTypes
binaryutil_test.go:136: PutInt32 failure, expected: []byte{0x78, 0x56, 0x34, 0x12}, got: []byte{0x12, 0x34, 0x56, 0x78}
--- FAIL: TestOtherTypes (0.00s)
FAIL
The full log is available at: https://ci.debian.net/data/autopkgtest/testing/s390x/g/golang-github-google-nftables/29113914/log.gz and I'm attaching it for reference. golang-github-google-nftables_s390x_29113914.log
I think the issue might be that the underlying mdlayher/netlink library is not big-endian compatible: https://github.com/mdlayher/netlink/issues/201 (many tests are skipped on big-endian systems in that package).
It is big endian compatible, but a lot of the tests use byte fixtures that are little endian. I don't have access to any big endian machines, but PRs are welcome.
Thanks for the reassuring answer regarding the code's correctness. I've decided to go ahead and upload a package that allows the test suite to fail on big endian archs.
I'm not really a Go developer or a porter, so I can't really provide a pull request to fix the tests. However, I do have access to a s390x porter box (via Debian), and can test any commits or branches you'd like me to.
Thanks for having a look, @mdlayher. Please let us know here when a new netlink release is available and I can pull that in :)
No worries! I will try to do that today.
I've released netlink v1.7.1 and genetlink v1.3.1 which fix all known endianness test failures. A lot of my tests hardcode fixed little endian byte dumps which isn't ideal but is useful for development.
Thanks, now pulled in!
@CyrilBrulebois Can you check which issues remain, if any, please?
@stapelberg Sure thing, but as far as I understand:
- netlink is BE compatible;
- netlink's tests were not BE compatible; that's confirmed to be fixed in the last release.
- nftables is believed to be BE compatible as well?
- nftables's tests are not BE compatible; the updated dependency on netlink doesn't change that, so the failures are exactly the same as before (at least from glancing at them, I can do a more systematic review if you prefer).
I only had a brief look thus far, maybe I have misinterpreted what I was seeing.
I’m not sure if I get a chance to have a closer look over the next few weeks, so any help with this is appreciated.
Hi @CyrilBrulebois and @stapelberg,
I have created a draft PR #218 with a fix approach for one test. In the log file, I see that there are a lot of failed test cases and I can change them all. Unfortunately, as there are a lot of tests to change, this requires time and I would like to get this approach reviewed and tested against a BE system before proceeding with a complete refactor.
If you can, please let me know whether this fixed the test on a BE system and whether this approach makes sense. If you have ideas how to best tackle this issue (e.g. skip tests on BE systems? or something else?), feel free to let me know.
I don’t have time to test on s390x currently, so if @CyrilBrulebois could give the PR a shot, that’d be great! Thanks.