mybinder.org-deploy icon indicating copy to clipboard operation
mybinder.org-deploy copied to clipboard

OVH / mlpack.org firewall issue

Open rcurtin opened this issue 3 years ago • 7 comments

Hi there everyone,

I am trying to track down what appears to be a strange firewall issue that appears only on OVH binder notebook instances. I run the mlpack open-source machine learning library, and many of the examples in our examples repository first fetch data from datasets.mlpack.org. But I am finding specifically that when on an OVH instance (like e.g. a notebook running on 51.178.95.56), connections to datasets.mlpack.org (209.195.13.98) simply time out. I've checked the firewall configuration on datasets.mlpack.org and found no issues there; notebooks running on other non-OVH servers seem to be able to connect fine. It seems likely to me that there is some OVH firewall rule blocking access to datasets.mlpack.org.

For an easy reproduction, just start a binder instance that has a shell on OVH, then do something like wget datasets.mlpack.org/ and it will simply time out.

Could someone here help with that---or point me to the right place to get the issue resolved? Thanks so much!

(I originally posted this in the Gitter chat, but @consideRatio suggested I open an issue here instead.)

rcurtin avatar Oct 21 '22 17:10 rcurtin

Thank you for opening your first issue in this project! Engagement like this is essential for open source projects! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please try to follow the issue template as it helps other other community members to contribute more effectively. welcome You can meet the other Jovyans by joining our Discourse forum. There is also an intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

welcome[bot] avatar Oct 21 '22 17:10 welcome[bot]

Thanks for reporting! This appears to be a problem with node-1 in the OVH cluster. I can reproduce the with non-user pods on node-1, and can connect to that host from other nodes from within user pods.

Unfortunately, we can't easily cordon node-1 since it's where the ingress controller has to be (for now).

https://github.com/jupyterhub/mybinder.org-deploy/pull/2379 should keep user pods away from node-1 while we work it out. @consideRatio can you have a look if that makes sense?

@mael-le-gal can you reboot node-1 to see if that fixes the issue?

minrk avatar Oct 24 '22 08:10 minrk

@minrk I just rebooted node-1

mael-le-gal avatar Oct 24 '22 08:10 mael-le-gal

@mael-le-gal thanks! The issue still appears, so there's something special about node-1 that's preventing egress to 209.195.13.98. Weirdly most other sites still work, and the same egress destination can be reached from other nodes.

minrk avatar Oct 24 '22 08:10 minrk

Could it be that there is/was a lot of traffic from node-1 and it got ratelimited on the mlpack side (via a generic rate limiting rule)?

betatim avatar Oct 24 '22 09:10 betatim

Hm, could be. But I think that's unlikely. I would expect all the nodes to have the same egress IP (not sure about that). I'm not really sure how to debug further.

minrk avatar Oct 24 '22 10:10 minrk

On the mlpack side we don't have any ratelimiting support set up. The system is just some 1U thing I threw in a rack somewhere and manually administrate; no nice proxy or "advanced setup" of any sort in front of it. :) When I was playing with this issue, I disabled all iptables rules on mlpack.org temporarily just to double-check, but there was no change. I also went through all the iptables rules and didn't find any that would block node-1 on either port 80 or 443.

rcurtin avatar Oct 25 '22 15:10 rcurtin

I tried this again today, with an outbound IP (from binder) of 51.68.77.249, and the request succeeded. Are there other OVH nodes I can check with? I tried a few times and always found myself with that outbound IP. I'd like to check again with 51.178.95.56 just to be sure the issue is resolved.

rcurtin avatar Nov 17 '22 19:11 rcurtin

@rcurtin can you actually try with https://ovh2.mybinder.org ? We are in the process of deploying a whole new cluster for the OVH federation member, so if there are any issues specific to the current cluster, they should go away next week.

minrk avatar Nov 18 '22 08:11 minrk

It seems like everything works from ovh2.mybinder.org. So, I guess, if the old cluster goes away next week, then we can resolve this then. :) Thanks for the help!

rcurtin avatar Nov 18 '22 19:11 rcurtin

Thanks for testing, @rcurtin!

minrk avatar Nov 21 '22 09:11 minrk