cloud-init
cloud-init copied to clipboard
net.get_interfaces() raises FileNotFoundError when network interface is renamed during enumeration
device_driver() & get_interfaces() may need some work to be resilient if an interface is being renamed while enumerating devices:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 860, in _get_data
crawled_data = util.log_time(
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2808, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 59, in impl
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 639, in crawl_metadata
self._setup_ephemeral_networking(timeout_minutes=timeout_minutes)
File "/usr/lib/python3/dist-packages/cloudinit/sources/helpers/azure.py", line 59, in impl
return func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceAzure.py", line 424, in _setup_ephemeral_networking
% (iface, net.get_interfaces()),
File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 1079, in get_interfaces
driver = device_driver(name)
File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 364, in device_driver
driver = os.path.basename(os.readlink(driver_path))
FileNotFoundError: [Errno 2] No such file or directory: '/sys/class/net/ib0/device/driver'
Around the same time we can see the interfaces are being renamed:
[ 36.915981] mlx5_core 0101:00:00.0 ibP257s165943: renamed from ib0
[ 36.953812] mlx5_core 0102:00:00.0 ibP258s165981: renamed from ib0
[ 37.057541] mlx5_core 0103:00:00.0 ibP259s166052: renamed from ib0
[ 37.104314] mlx5_core 0104:00:00.0 ibP260s166131: renamed from ib0
[ 37.157170] mlx5_core 0105:00:00.0 ibP261s166145: renamed from ib0
[ 37.208995] mlx5_core 0106:00:00.0 ibP262s166254: renamed from ib0
[ 37.240758] mlx5_core 0107:00:00.0 ibP263s166326: renamed from ib0
[ 37.276640] mlx5_core 0108:00:00.0 ibP264s166393: renamed from ib0
device_driver() & get_interfaces() may need some work to be resilient if an interface is being renamed while enumerating devices:
Agreed, but I think the problem is bigger than these two callsites.
A retry in get_interfaces()
on FileNotFoundError
would prevent this exception, but even with such a change, if get_interfaces()
gets called prior to a rename, then we're going to see the same issue but elsewhere (such as in one of the read_sys_net_safe()
calls in find_candidate_nics_on_linux()
- which would probably be more difficult to debug because it wouldn't traceback). Anywhere that expects to use the returned interface name may also see this issue.
Do you know if the rename is triggered by userspace? Also, what distro/release did you see this on?
It's udev renaming them from some generic ib0 -> ibP257s165943. The devices are just be enumerated late in the process for whatever reason. But we can tell it's being enumerated because they are each showing up as ib0 before getting renamed.