nftables icon indicating copy to clipboard operation
nftables copied to clipboard

Test failures on s390x: endianness problems?

Open CyrilBrulebois opened this issue 3 years ago • 11 comments

In addition to the hopefully easy 32-bit issue (#209), I'm wondering whether big endian systems might be entirely busted. That's hard to say for sure since s390x is the only one big endian system available on the https://ci.debian.net/ infrastructure…

I'll just quote one occurrence:

=== RUN   TestConfigureNAT
    nftables_test.go:313: message 2: got -- want
          02 00 00 00 -- 02 00 00 00
        ! 00 08 00 01 -- 08 00 01 00
          6e 61 74 00 -- 6e 61 74 00
        ! 00 08 00 02 -- 08 00 02 00
          00 00 00 00 -- 00 00 00 00
           -- 

In ! lines, see how byte pairs are swapped. There are many more occurrences of this issue.

Here's an apparently different issue, but maybe that's the same reason (with the length being misread due to swapped bytes):

=== RUN   TestGetRules
    nftables_test.go:831: message 0: got -- want
          02 00 00 00 -- 02 00 00 00
        ! 00 0b 00 01 -- 0b 00 01 00
          66 69 6c 74 -- 66 69 6c 74
          65 72 00 00 -- 65 72 00 00
        ! 00 0a 00 02 -- 0a 00 02 00
          69 6e 70 75 -- 69 6e 70 75
          74 00 00 00 -- 74 00 00 00
           -- 
    nftables_test.go:853: invalid attribute; length too short or too large
--- FAIL: TestGetRules (0.00s)

and another one (might also be due to swapped bytes):

=== RUN   TestListChains
    nftables_test.go:1550: error returned from TestDial unexpected header type: got unknown(0), want unknown(2563)
--- FAIL: TestListChains (0.00s)

There's a final issue that looks like bytes are getting stashed in the wrong direction, and not just swapped:

=== RUN   TestOtherTypes
    binaryutil_test.go:136: PutInt32 failure, expected: []byte{0x78, 0x56, 0x34, 0x12}, got: []byte{0x12, 0x34, 0x56, 0x78}
--- FAIL: TestOtherTypes (0.00s)
FAIL

The full log is available at: https://ci.debian.net/data/autopkgtest/testing/s390x/g/golang-github-google-nftables/29113914/log.gz and I'm attaching it for reference. golang-github-google-nftables_s390x_29113914.log

CyrilBrulebois avatar Dec 10 '22 01:12 CyrilBrulebois

I think the issue might be that the underlying mdlayher/netlink library is not big-endian compatible: https://github.com/mdlayher/netlink/issues/201 (many tests are skipped on big-endian systems in that package).

stapelberg avatar Dec 10 '22 09:12 stapelberg

It is big endian compatible, but a lot of the tests use byte fixtures that are little endian. I don't have access to any big endian machines, but PRs are welcome.

mdlayher avatar Dec 10 '22 16:12 mdlayher

Thanks for the reassuring answer regarding the code's correctness. I've decided to go ahead and upload a package that allows the test suite to fail on big endian archs.

I'm not really a Go developer or a porter, so I can't really provide a pull request to fix the tests. However, I do have access to a s390x porter box (via Debian), and can test any commits or branches you'd like me to.

CyrilBrulebois avatar Dec 12 '22 04:12 CyrilBrulebois

Thanks for having a look, @mdlayher. Please let us know here when a new netlink release is available and I can pull that in :)

stapelberg avatar Dec 12 '22 07:12 stapelberg

No worries! I will try to do that today.

mdlayher avatar Dec 12 '22 10:12 mdlayher

I've released netlink v1.7.1 and genetlink v1.3.1 which fix all known endianness test failures. A lot of my tests hardcode fixed little endian byte dumps which isn't ideal but is useful for development.

mdlayher avatar Dec 12 '22 13:12 mdlayher

Thanks, now pulled in!

@CyrilBrulebois Can you check which issues remain, if any, please?

stapelberg avatar Dec 12 '22 16:12 stapelberg

@stapelberg Sure thing, but as far as I understand:

  • netlink is BE compatible;
  • netlink's tests were not BE compatible; that's confirmed to be fixed in the last release.
  • nftables is believed to be BE compatible as well?
  • nftables's tests are not BE compatible; the updated dependency on netlink doesn't change that, so the failures are exactly the same as before (at least from glancing at them, I can do a more systematic review if you prefer).

CyrilBrulebois avatar Dec 14 '22 00:12 CyrilBrulebois

I only had a brief look thus far, maybe I have misinterpreted what I was seeing.

I’m not sure if I get a chance to have a closer look over the next few weeks, so any help with this is appreciated.

stapelberg avatar Dec 14 '22 07:12 stapelberg

Hi @CyrilBrulebois and @stapelberg,

I have created a draft PR #218 with a fix approach for one test. In the log file, I see that there are a lot of failed test cases and I can change them all. Unfortunately, as there are a lot of tests to change, this requires time and I would like to get this approach reviewed and tested against a BE system before proceeding with a complete refactor.

If you can, please let me know whether this fixed the test on a BE system and whether this approach makes sense. If you have ideas how to best tackle this issue (e.g. skip tests on BE systems? or something else?), feel free to let me know.

turekt avatar Mar 13 '23 19:03 turekt

I don’t have time to test on s390x currently, so if @CyrilBrulebois could give the PR a shot, that’d be great! Thanks.

stapelberg avatar Mar 14 '23 16:03 stapelberg