xcp icon indicating copy to clipboard operation
xcp copied to clipboard

Infiniband not detected in 8.2

Open Oleszkiewicz opened this issue 4 years ago • 17 comments

When rescanning NICs in 8.2 infiniband interfaces (Mellanox ConnectX 3 based) cause internal error. The same operation on 8.1 works with no problem.

Oleszkiewicz avatar Nov 27 '20 22:11 Oleszkiewicz

Can you get details about the error from the logs? https://xcp-ng.org/docs/troubleshooting.html

stormi avatar Nov 27 '20 22:11 stormi

Sure, how can I send you the bugreport from the tool so I don’t publish all the data to github 😊?

From: Samuel VERSCHELDE [email protected] Sent: Friday, November 27, 2020 11:53 PM To: xcp-ng/xcp [email protected] Cc: Piotr Oleszkiewicz [email protected]; Author [email protected] Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

Can you get details about the error from the logs? https://xcp-ng.org/docs/troubleshooting.html

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-735006307, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVAJJ7R4HR56OVKOVPTSSAUWDANCNFSM4UFLPOKQ.

Oleszkiewicz avatar Nov 27 '20 23:11 Oleszkiewicz

That's what our support tickets are :) https://support.vates.fr (you need to have an account on xen-orchestra.com)

olivierlambert avatar Nov 28 '20 08:11 olivierlambert

What I get is :

Oops! It seems you don't have access to our support service, please subscribe to a plan to unlock this feature

:P

From: Olivier Lambert [email protected] Sent: Saturday, November 28, 2020 9:29 AM To: xcp-ng/xcp [email protected] Cc: Piotr Oleszkiewicz [email protected]; Author [email protected] Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

That's what our support tickets are :) https://support.vates.fr (you need to have an account on xen-orchestra.com)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-735106876, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVGA6ERO2N2TNRKMGSLSSCYE5ANCNFSM4UFLPOKQ.

Oleszkiewicz avatar Nov 28 '20 11:11 Oleszkiewicz

You need to have a registered XOA to unlock the support panel.

edit: our process is not optimal, we'll improve it early in 2021!

olivierlambert avatar Nov 28 '20 14:11 olivierlambert

I don’t use it, can you activate account manually? Anyway this is a funny design decision 😊 I want to contribute to the project by providing this bug report and possibly helping in testing a fix, but do I really need to deploy and register XOA for that ?

From: Olivier Lambert [email protected] Sent: Saturday, November 28, 2020 3:39 PM To: xcp-ng/xcp [email protected] Cc: Piotr Oleszkiewicz [email protected]; Author [email protected] Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

You need to have a registered XOA to unlock the support panel.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-735238641, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVAECAGVS47LS4WTFOLSSEDPDANCNFSM4UFLPOKQ.

Oleszkiewicz avatar Nov 28 '20 15:11 Oleszkiewicz

It's not funny, it's historical. We did Xen Orchestra back in 2014/2015, long before we did XCP-ng. So it takes time to adapt your systems from one product to multiple products (we are working on it).

Also, we offer you for free a way to have private tickets, which is -I think- already a nice perk.

olivierlambert avatar Nov 28 '20 17:11 olivierlambert

True. Good luck with system adaptation then :)

Sent from my Galaxy

-------- Original message -------- From: Olivier Lambert [email protected] Date: 11/28/20 18:13 (GMT+01:00) To: xcp-ng/xcp [email protected] Cc: Piotr Oleszkiewicz [email protected], Author [email protected] Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

It's not funny, it's historical. We did Xen Orchestra back in 2014/2015, long before we did XCP-ng. So it takes time to adapt your systems from one product to multiple products (we are working on it).

Also, we offer you for free a way to have private tickets, which is -I think- already a nice perk.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-735257784, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVF3EV7WGIP5UIMMFI3SSEVSFANCNFSM4UFLPOKQ.

Oleszkiewicz avatar Nov 28 '20 20:11 Oleszkiewicz

It should be far better in a matter of few months!

olivierlambert avatar Nov 28 '20 20:11 olivierlambert

So, in the end, do the changes from https://github.com/xapi-project/xcp-networkd/pull/185 bring significant benefits, even if there remains work to be done to properly support SR-IOV?

stormi avatar Apr 02 '21 14:04 stormi

Just a bit – IB interface presence does not render system totally unusable. This is a benefit. Yet, I could do the same by just unloading IB kernel module.

To have this really “moderately complete” we should not even ensure “support” for IB in SR-IOV, but properly ignore IB in SR-IOV. Currently the presence of IB interface – causes inability to start a VM (and I guess the fix is a one-liner for someone who is proficient with xen network code – just add a condition to prevent setting MAC on a non-eth interface).

This would not provide a real support for IB (by that I mean proper interface recognition, possibility to manipulate it from the hypervisor etc.) but at least the hypervisor would “leave it alone” and properly ignore it so an advanced user could make use of it manually without hypervisor “knowing” about it in any way.

Best Piotr

From: Samuel VERSCHELDE @.> Sent: Friday, April 2, 2021 4:18 PM To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Author @.> Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

So, in the end, do the changes from xapi-project/xcp-networkd#185https://github.com/xapi-project/xcp-networkd/pull/185 bring significant benefits, even if there remains work to be done to properly support SR-IOV?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-812550224, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVBNVQ5AD7OPETJGB2LTGXGZTANCNFSM4UFLPOKQ.

Oleszkiewicz avatar Apr 02 '21 15:04 Oleszkiewicz

So we discussed it internally and unfortunately making IB work through SR-IOV is not a priority right now for our busy team, so fixing it will either have to rely on the community or go through our commercial channels.

There could be several ways to move forward:

  • a community contribution to the XAPI to make it work
  • realizing that the use case is more common than we thought (right now it doesn't seem it is)
  • for a user having enterprise XCP-ng support we would definitely at least try to put a few days on the issue to see if it is feasible in a reasonable amount of time
  • a sponsored development

stormi avatar May 04 '21 16:05 stormi

This is not about making IB work through SR-IOV anymore, it is more like making ETH work in an environment where IB is present. The fix I have proposed is just properly ignoring IB, currently XCP-ng tries to do a “funny” thing talking ETH to an IB interface crashing the VM start altogether. I think that ignoring IB (or other non ETH interfaces) properly can help avoid some hard to detect/debug problems in the future, as XAPI talking ETH to a non ETH interface can lead to multiple side effects. Internally we have solved the problem by using ETH only mode on NICs, however the problem will currently surface when a non ETH interface is present on the same PCI id as the ETH one. As mentioned, the fix to this would be probably one more condition in the right spot of the code (basically adding a check if the interface we want to initialize is Ethernet). Adding one “if” condition should be a 5 min fix to a person that knows xapi code well. Probably less than the discussions if it is worth solving :P

Best Piotr

From: Samuel VERSCHELDE @.> Sent: Tuesday, May 4, 2021 6:24 PM To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Author @.> Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

So we discussed it internally and unfortunately making IB work through SR-IOV is not a priority right now for our busy team, so fixing it will either have to rely on the community or go through our commercial channels.

There could be several ways to move forward:

  • a community contribution to the XAPI to make it work
  • realizing that the use case is more common than we thought (right now it doesn't seem it is)
  • for a user having enterprise XCP-ng support we would definitely at least try to put a few days on the issue to see if it is feasible in a reasonable amount of time
  • a sponsored development

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-832072399, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVHFVDX6CXVQAT3SR3LTMANRRANCNFSM4UFLPOKQ.

Oleszkiewicz avatar May 04 '21 16:05 Oleszkiewicz

5 min to modify XAPI, I've never seen that happen. First, it's written in ocaml, which not many people can read let alone write. And a "person that knows xapi code well", that's a scarce resource and that resource definitely isn't me. Secondly, what the main XAPI developers told us when we asked them privately about what would remain to do is way beyond just a if to add somewhere (because, contrarily to what you think, we've already spent a lot more than 5 minutes on the issue). Third, even is the fix was trivial, any change then triggers a lot of QA before this can become an update for everyone.

So I stand by what I wrote.

stormi avatar May 04 '21 17:05 stormi

Thanks for the clarification about what the exact issue is, though.

stormi avatar May 04 '21 17:05 stormi

Hi,

I understand scarcity of the resource – if it was trivial for me – I would do that myself, however I do not know ocaml, and I don’t know XAPI code. You mean ignoring IB properly is more than an IF? When I look at the PIF-scan fix that Citrix team did – the resulting fix was relatively simple. My idea was it would be a similar fix in another component. As mentioned I do not know XAPI code and I would not give my arm for this.\

Best, Piotr

From: Samuel VERSCHELDE @.> Sent: Tuesday, May 4, 2021 7:29 PM To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Author @.> Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

5 min to modify XAPI, I've never seen that happen. First, it's written in ocaml, which not many people can read let alone write. A "person that knows xapi code well", that's a scarce resource and that resource definitely isn't me. Secondly, what the main XAPI developers told us when we asked them privately about what would remain to do is way beyond just a if to add somewhere (because, contrarily to what you think, we've already spent a lot more than 5 minutes on the issue). Third, even is the fix was trivial, any change then triggers a lot of QA before this can become an update for everyone.

So I stand by what I wrote.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-832114078, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVDHYVR7SWP674T2DXDTMAVETANCNFSM4UFLPOKQ.

Oleszkiewicz avatar May 04 '21 17:05 Oleszkiewicz

Thanks,

As mentioned before – I have solved the problem we had internally by changing the architecture a bit to accommodate for known system issues. I have just believed that fixing this / ignoring IB/non ETH interfaces properly would improve the overall quality of the product and could possibly help others avoid unforeseen side effects. I assume IB is just one kind of a non-ETH interface that could possibly cause similar problems.

Best Piotr

From: Samuel VERSCHELDE @.> Sent: Tuesday, May 4, 2021 7:31 PM To: xcp-ng/xcp @.> Cc: Piotr Oleszkiewicz @.>; Author @.> Subject: Re: [xcp-ng/xcp] Infiniband not detected in 8.2 (#460)

Thanks for the clarification about what the exact issue is, though.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/xcp-ng/xcp/issues/460#issuecomment-832115470, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVGFJLXF6IXYCP72YLTTMAVMXANCNFSM4UFLPOKQ.

Oleszkiewicz avatar May 04 '21 17:05 Oleszkiewicz