charm-helpers contrib.opensack.cert_utils.get_certificate_request like() can greate None indexted list

contrib.opensack.cert_utils may generate None a indexed list if a ip address exists on the system that is not resolvable.

In my case I have a IPv6 address fc00:4248:fae8:2d6d:f816:3eff:fe27:ec30 that doesn't resolve to anything.

6: o-hm0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 8950 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:27:ec:30 brd ff:ff:ff:ff:ff:ff
    inet6 fc00:4248:fae8:2d6d:f816:3eff:fe27:ec30/64 scope global dynamic mngtmpaddr noprefixroute 
       valid_lft 86352sec preferred_lft 14352sec
    inet6 fe80::3c60:e3ff:fef8:60fe/64 scope link 
       valid_lft forever preferred_lft forever

This leads to a list of requests returned from get_certificate_request like

{
'octavia.example.com': {'sans': ['100.115.0.168', '100.115.0.56', 'octavia.example.com']},
None: {'sans': ['fc00:4248:fae8:2d6d:f816:3eff:fe27:ec30']}
}

None causes a huge amount of problems downstream.

Here it tries to resolve it without any fallback https://github.com/juju/charm-helpers/blob/28f26b780c83d18fe1c7a8ab84c29581e6191879/charmhelpers/contrib/openstack/cert_utils.py#L93

Sep 14 '22 16:09 tbaumann

My workaround is 'cn': get_hostname(ip) or ip,

Sep 14 '22 16:09 tbaumann

To clarify. It's really not exactly clear under which circumstances this happens. I observed it with the Octavia charm. (All the other units don't have ipv6 so it probably was never triggered elsewhere)

The code around where it happens looks totally benign. It's basically the tls-certificates interface layer plain vanilla.

What is utterly strange however is, that the issue is triggered or becomes visible only during juju agent upgrades or when the model is migrated to a different controller. I have no clue what environmental factors are triggered by that.

If you reboot the instance the bug goes away. But the v6 address stays.

Sep 16 '22 14:09 tbaumann

I just traced the issue down to this line and not matter what triggers it, a list with None as key can't be good. And nothing in the tls-certificate layer looks in any way out of place.

Before this vault only had one request. After the juju upgrade-model and this fix to charmhelpers we have two.

before juju show-unit vault/0

      octavia/1:
        in-scope: true
        data:
          cert_requests: '{"juju-52da4d-22-lxd-21.example.com": {"sans":
            ["100.115.0.168", "100.115.0.56"]}}'
          certificate_name: 91a7d86f-0e14-405d-a722-7c3e557b3dbb
          common_name: octavia.example.com
          egress-subnets: 100.115.0.168/32
          ingress-address: 100.115.0.168
          private-address: 100.115.0.168
          sans: '["100.115.0.168", "100.115.0.56", "octavia.example.com"]'
          unit_name: octavia_1

after juju show-unit vault/0

      octavia/1:
        in-scope: true
        data:
          cert_requests: '{"fc00:4248:fae8:2d6d:f816:3eff:fe27:ec30": {"sans": ["fc00:4248:fae8:2d6d:f816:3eff:fe27:ec30"]},
            "juju-52da4d-22-lxd-21.example.com": {"sans": ["100.115.0.168",
            "100.115.0.56"]}}'
          certificate_name: 91a7d86f-0e14-405d-a722-7c3e557b3dbb
          common_name: octavia.example.com
          egress-subnets: 100.115.0.168/32
          ingress-address: 100.115.0.168
          private-address: 100.115.0.168
          sans: '["100.115.0.168", "100.115.0.56", "octavia.example.com"]'
          unit_name: octavia_1

Sep 16 '22 15:09 tbaumann

Having ipv6 addresses floating around in vault caused more issues.

Nov 29 '22 13:11 tbaumann

Am I really the only one seeing this?

What's special here is that we only use IPv6 for the amphoras. Perhaps that is unusual?

Also I found no way of re-building my charm based on a released version (with build.lock) and my patched charmhelpers. I had to manually patch the zip files. I really need something long term for that.

Jan 13 '23 09:01 tbaumann

@fnordahl have you seen this one before?

Mar 02 '23 02:03 wolsen