python-rdma
python-rdma copied to clipboard
Catch and log exceptions from failed links
I propose to add simple exception handlers for failed links, so that discovery can finish in their presence.
Thank you for the patch,
What scenarios were you able to use this in?
There are many reasons a SMP send during discovery could fail, this seems to deal with the forward direction failing - is that because a SMA is non-responsive or similar?
I would think the most common reason would be a change in the already discovered region - ie a link going down?
This helps in two cases observed in our practice:
- an active link is faulty: the switch port is up, but cannot transmit anything;
- a device is non-responsive: its ports are up, but do not respond to MADs.
In both cases python-rdma's discovery does not finish due to exceptions. On the contrary, standard tools like ibnetdiscover do finish while reporting observed errors. Results of discovery with proposed patch are consisted with those of ibnetdiscover.