keepalived
keepalived copied to clipboard
LTO-enabled build produces broken VRRPv3 checksums on Ubuntu 22.04
Describe the bug
When built on Ubuntu 22.04 x86_64 with --enable-lto
, keepalived produces broken VRRPv3 checksums.
When built without LTO, the binary produces correct VRRPv3 checksums.
To Reproduceg
On Ubuntu 22.04, build current git master 0cdeb605dbcbec987fe7c1d0b152586e30e209b5
, with:
./configure --build=x86_64-linux-gnu --prefix=/usr --includedir=\${prefix}/include --mandir=\${prefix}/share/man --infodir=\${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=\${prefix}/lib/x86_64-linux-gnu --libexecdir=\${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-bfd --disable-lvs --disable-snmp --with-init=none --enable-lto
Then launch the resulting binary with the following configuration:
global_defs {
vrrp_version 3
}
vrrp_instance VI_1 {
interface ens5
priority 101
virtual_router_id 51
advert_int 1
accept
garp_master_refresh 5
garp_master_refresh_repeat 1
unicast_src_ip 10.1.18.148/24
unicast_peer {
10.1.18.218
}
virtual_ipaddress {
127.0.0.15
}
}
On a backup node then observe the logs with the following statements:
Jul 13 13:35:45 ip-10-1-18-218 Keepalived_vrrp[12123]: (VI_1) Invalid VRRPv3 checksum
Jul 13 13:36:00 ip-10-1-18-218 Keepalived_vrrp[12123]: (VI_1) Invalid VRRPv3 checksum
Or in tcpdump -vv -ni ens5 vrrp
:
16:37:42.817045 IP (tos 0xc0, ttl 255, id 70, offset 0, flags [none], proto VRRP (112), length 32)
10.1.18.148 > 10.1.18.218: VRRPv3, Advertisement, vrid 51, prio 101, intvl 100cs, length 12, (bad vrrp cksum 55b0), addrs: 127.0.0.15
16:37:43.817076 IP (tos 0xc0, ttl 255, id 71, offset 0, flags [none], proto VRRP (112), length 32)
10.1.18.148 > 10.1.18.218: VRRPv3, Advertisement, vrid 51, prio 101, intvl 100cs, length 12, (bad vrrp cksum 55b0), addrs: 127.0.0.15
Expected behavior I expect keepalived built with LTO on Ubuntu 22.04 to provide a correctly working unicast VRRP checksums
Keepalived version
Keepalived v2.2.7 (07/05,2022), git commit v2.2.7-55-g0cdeb605
Copyright(C) 2001-2022 Alexandre Cassen, <[email protected]>
Built with kernel headers for Linux 5.15.30
Running on Linux 5.15.0-1004-aws #6-Ubuntu SMP Thu Mar 31 09:44:20 UTC 2022
Distro: Ubuntu 22.04 LTS
configure options: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --disable-option-checking --disable-silent-rules --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-bfd --disable-lvs --disable-snmp --with-init=none --enable-lto build_alias=x86_64-linux-gnu
Config options: VRRP VRRP_AUTH VRRP_VMAC BFD OLD_CHKSUM_COMPAT INIT=none
System options: VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
Distro (please complete the following information):
- Name: Ubuntu
- Version: 22.04
- Architecture: x86_64
Configuration file:
Specified in the "how to reproduce"
Additional context
I've actually noticed this issue when I was doing regular deb package builds for keepalived on Ubuntu. They do not explicitely enable LTO via --enable-lto
, but rather provide LTO flags via CFLAGS/LDFLAGS by default:
export CFLAGS=$(dpkg-buildflags --get CFLAGS)
export LDFLAGS=$(dpkg-buildflags --get LDFLAGS)
See for more details on how Ubuntu does this: https://wiki.ubuntu.com/ToolChain/LTO
In reality, I think I can get away by skipping LTO for my local builds, but I assume this might hit in other unexpected ways in the future, so maybe fixing the real cause is a better idea -- hence this bug report.
I'll install a Ubuntu 22.04 VM and see if I can reproduce this. If I can then I'll also try it on some other systems with different compiler versions.
My initial thought is that using LTO or not shouldn't make a difference to the functionality of the code produced, which would imply a toolchain issue rather than a keepalived issue. I'll keep an open mind though until I have tested it.
I can reproduce this problem on Ubuntu 22.04 (gcc 11.2), and also on Fedora 36 (gcc 12.1.1), Fedora 34 (gcc 11.3.1), Fedora Rawhide (gcc 12.1.1). I do NOT get the problem on Ubuntu 22.04 using clang (ver 14.0.0), or on Fedora34 when using clang (version 12.0.1).
So this appears to be a gcc problem, since the problem does not occur clang, and it does not occur with gcc if it is not use LTO.
On inspecting in_csum() in lib/utils.c it was evident that there were a couple of optimisations that could be made. Commit 7b32ab3 appears to be the simplest optimization that stops this GCC LTO bug being exposed. I have tested this on Fedora 34, but not on Ubuntu 22.04.
Since this is a GCC bug, and not a keepalived bug, there is no guarantee that this change will workaround the problem with all versions of GCC, or that future version of GCC won't expose the problem again.
This issue needs to be reported as a GCC bug. @thresheek are you prepared to report this at gnu.org. If not I will report it, but I don't have the time, or the understanding, to produce a minimal test case; I would simply report it that there is a problem resolved by commit 7b32ab3.
Hi @pqarmitage - many thanks for a quick workaround! I'll test that locally on my side with gcc too.
I'm not really in a knowledgeable position to open a gcc bug for the same reasons you mentioned - no knowledge of code or what's going on to prepare a test case. I would really appreciate if you open one even if it would lack details to actually provide a test case.
Hmm, unfortunately I still experience the same behaviour on Ubuntu 22.04 when built with gcc version 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
:
This fails (invalid VRRPv3 checksum):
Keepalived v2.2.7 (07/14,2022), git commit v2.2.7-59-g11308ecb
Copyright(C) 2001-2022 Alexandre Cassen, <[email protected]>
Built with kernel headers for Linux 5.15.30
Running on Linux 5.15.0-1004-aws #6-Ubuntu SMP Thu Mar 31 09:44:20 UTC 2022
Distro: Ubuntu 22.04 LTS
configure options: --disable-lto
Config options: LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT INIT=systemd
System options: VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
This works (checksums are ok):
Keepalived v2.2.7 (07/14,2022), git commit v2.2.7-59-g11308ecb
Copyright(C) 2001-2022 Alexandre Cassen, <[email protected]>
Built with kernel headers for Linux 5.15.30
Running on Linux 5.15.0-1004-aws #6-Ubuntu SMP Thu Mar 31 09:44:20 UTC 2022
Distro: Ubuntu 22.04 LTS
configure options: --enable-lto
Config options: LVS VRRP VRRP_AUTH VRRP_VMAC OLD_CHKSUM_COMPAT INIT=systemd
System options: VSYSLOG MEMFD_CREATE IPV6_MULTICAST_ALL IPV4_DEVCONF RTA_ENCAP RTA_EXPIRES RTA_NEWDST RTA_PREF FRA_SUPPRESS_PREFIXLEN FRA_SUPPRESS_IFGROUP FRA_TUN_ID RTAX_CC_ALGO RTAX_QUICKACK RTEXT_FILTER_SKIP_STATS FRA_L3MDEV FRA_UID_RANGE RTAX_FASTOPEN_NO_COOKIE RTA_VIA FRA_PROTOCOL FRA_IP_PROTO FRA_SPORT_RANGE FRA_DPORT_RANGE RTA_TTL_PROPAGATE IFA_FLAGS LWTUNNEL_ENCAP_MPLS LWTUNNEL_ENCAP_ILA NET_LINUX_IF_H_COLLISION LIBIPTC_LINUX_NET_IF_H_COLLISION IPVS_DEST_ATTR_ADDR_FAMILY IPVS_SYNCD_ATTRIBUTES IPVS_64BIT_STATS IPVS_TUN_TYPE IPVS_TUN_CSUM IPVS_TUN_GRE VRRP_IPVLAN IFLA_LINK_NETNSID GLOB_BRACE GLOB_ALTDIRFUNC INET6_ADDR_GEN_MODE VRF SO_MARK
@thresheek If I'm reading what you have written above correctly, it is now working with LTO, but it is not working without LTO. Can you please confirm this, since it is the opposite of the situation yesterday.
I have now tested this on my Ubuntu 22.04 VM, and it is working for me both with and without LTO. I am using the same version of gcc as you and the same git commit.
Hmmmm, I think I did indeed somehow mess up the tests I did in the morning. I've recreated a new VM, ran those tests, and can no longer reproduce - it works as expected on current git HEAD with --enable-lto, --disable-lto, and just passing -flto and friends via CFLAGS and LDFLAGS directly. I have also tested that again on the VM I initially used for tests and it also behaves as expected there.
So, in all, disregard my comment in https://github.com/acassen/keepalived/issues/2160#issuecomment-1184369569 as an absolutely wrong one.
Thank you!
Well that's a relief. I now need to sort out reporting this as a gcc bug.
I have looked further at commit 7b32ab3 and it is utter nonsense. All it is doing is stopping checksum checks working properly, and any such checks will now always succeed.
I will revert the commit once a find a real workaround to the gcc bug.
I have now reverted commit 7b32ab3 and added commit f7341ee which should provide a proper workaround for the problem, with no incorrect side effects.
The solution was to mark the in_csum() function as noinline if building with GCC and LTO.
I have now reverted commit #7b32ab3 and added commit #f7341ee which should provide a proper workaround for the problem, with no incorrect side effects.
The solution was to mark the in_csum() function as noinline if building with GCC and LTO.
This issue has now been reported as GCC bug 106706.
Well, it transpires that it is not a GCC bug, but a strict aliasing issue.
As Andrew Pinski from gcc.gnu.org states:
This is an obvious C/C++ aliasing violation.
ipv4_phdr_t ipv4_phdr;
...
in_csum(({ ( uint16_t *) (&ipv4_phdr); }), sizeof(ipv4_phdr), 0, &vrrp->ipv4_csum);
inside in_csum:
uint16_t
in_csum(const uint16_t *addr, size_t len, uint32_t csum, uint32_t *acc)
....
while (nleft > 1) {
csum += *addr++;
You need to use either -fno-strict-aliasing or you need to mark the argument addr in in_csum as may_alias.
I don't want to add -fno-strict-aliasing
due to the performance implications. I have tried marking addr
in in_csum
as may_alias
, and also various other may_alias
uses, but I can't find something that works yet.
I have now merge commit 77415a2 that makes in_csum() safe for strict aliasing. However I suspect there is quite a bit of code in keepalived that is not safe for strict aliasing, so commit f62872d adds --fno-strict-aliasing
to CFLAGS until we have checked all casts and ensured that it is safe to re-enable strict aliasing.
I still need to sort out why marking addr with attribut may_alias doesn't appear to work (note to self: test with GCC 12.2).
I have now worked out how to use attribute may_alias. It appears that it has to be specified in a typedef, and then the function parameter needs to be specified using that typedef. So for example, this works:
typedef float __attribute__((may_alias)) float_a;
static int
foo(float_a *f, int *i)
{
*i = 1;
*f = 0.f;
return *i;
}
but this doesn't work:
static int
foo(float __attribute((__may_alias__)) *f, int *i)
{
*i = 1;
*f = 0.f;
return *i;
}
The problem is that not only does it not work, but GCC doesn't give any warning or error about the attribute being ignored or meaningless.
I'll have a look at what is best to do with keepalived. In the meantime it is using -fno-strict-aliasing
Commit aef1b58 changes in_csum to use the may_alias attribute by default.
Many thanks for solving this!