daos icon indicating copy to clipboard operation
daos copied to clipboard

DAOS-11796 control: Consistently resolve MS replica addresses

Open mjmac opened this issue 2 years ago • 5 comments

In environments where access_points hostnames can resolve to multiple IP addresses in a nondeterministic manner, we can run into problems due to MS peers not recognizing each other. This patch works around the problem by pinning each replica to the lowest IP address in the set of addresses associated with each replica's hostname.

Signed-off-by: Michael MacDonald [email protected]

mjmac avatar Oct 10 '22 19:10 mjmac

Bug-tracker data: Ticket title is 'daos container create failed: "DER_NONEXIST(-1005): The specified entity does not exist"' Status is 'In Review' Labels: 'HPE_dep,tds,triaged' https://daosio.atlassian.net/browse/DAOS-11796

github-actions[bot] avatar Oct 10 '22 19:10 github-actions[bot]

Test stage NLT completed with status FAILURE. https://build.hpdd.intel.com//job/daos-stack/job/daos/view/change-requests/job/PR-10535/1/execution/node/831/log

daosbuild1 avatar Oct 10 '22 20:10 daosbuild1

Is there a master version of this PR, too? Or did that already land?

kjacque avatar Oct 12 '22 01:10 kjacque

Is there a master version of this PR, too? Or did that already land?

No, not yet. I was initially thinking that this was just a workaround for the 2.2.x series, and I wanted to get feedback on the approach from Aurora testing. I may go ahead and land it on master while I consider a more intrusive change for the 2.4+ series. I haven't yet decided whether or not it's an issue that the raft stuff only knows about a single IP address for a given hostname. Maybe it's actually fine?

mjmac avatar Oct 12 '22 13:10 mjmac

I haven't yet decided whether or not it's an issue that the raft stuff only knows about a single IP address for a given hostname. Maybe it's actually fine?

When I was looking into similar issues previously, I had gotten the idea into my head that we'd probably want raft to know about all the IPs associated with a hostname so we could match against any of them. But with this solution I don't think that would be necessary.

kjacque avatar Oct 12 '22 22:10 kjacque