binderhub icon indicating copy to clipboard operation
binderhub copied to clipboard

Add config option to disable network egress for pods launched w/o a captcha solution

Open betatim opened this issue 3 years ago • 11 comments

Proposed change

Add a config option to allow BinderHub administrators to turn off network egress for pods launched without a solution to a captcha.

We propose to add an optional parameter to launch requests which contains the solution to a captcha (proof of human-ness). When a valid solution is submitted the Pod is launched with network egress, if no valid solution is submitted the pod is launched without egress.

Allowing launch requests w/o a captcha means we don't break API users like thebe or juniper or people using the API. The hypothesis is that most of those using these tools do not require network egress. If they do we need to investigate if it is possible to obtain a solution to the mybinder.org captcha from a page hosted at example.com or the command line. After some research I am not sure what the answer is. A potential avenue might be https://www.hcaptcha.com/accessibility

Limiting the amount of disruption and abuse from miners is tricky to balance with keeping the service easy to use for legitimate users. Removing network egress seems like a good trade off, especially if we find a way for API users to obtain captcha solutions.

Alternative options

Some alternative ideas we have had:

  1. Limiting launch requests based on source IP - difficult for courses and such who are behind a proxy so that all users appear to have the same IP
  2. Requiring authentication to launch pods - removes a major benefit of mybinder.orgg which is "no signup, no nothing, just go"
  3. Starting all pods without egress and allowing people to enable it by solving a captcha - would need some UI work inside the pods and unclear if kubernetes even allows us to do this
  4. Using "proof of work" (POW) to launch a pod - if launching a new pod is delayed by 20s or so required to solve the POW challenge, that does not really deter abusers who want to use hours of CPU time but is a major slowdown for normal users.

Who would use this feature?

This would benefit operators of large BinderHubs as it reduces the amount of time and effort spent on detecting and blocking abuse. In turn this benefits the users of large BinderHubs because they don't have to share resources with abusers or live with more restrictive resource allocations aimed at making the BinderHub a less attractive target for abusers.

(Optional): Suggest a solution

Add a widget from a service like https://www.hcaptcha.com/ to our launch form and launch pages. Submit the solution with the launch request as an additional query parameter. Check the solution server side and make a decision on the Pod's network egress configuration.

This idea came out of a discussion with @choldgraf on the twitters (sometimes all the social media is a benefit to humanity after all ;) )

betatim avatar Nov 23 '21 08:11 betatim

I think this is probably a good idea, though it wouldn't significantly deter abusers who are manually starting sessions to kick-off mining runs. Obviously hard to say for sure, but I suspect a significant fraction are doing by hand in a browser:

  1. launch binder
  2. open terminal
  3. paste kick-off script
  4. repeat

Adding a captcha click doesn't really impact that workflow. If more folks were scripting mining abuse, I'd expect the cluster to get absolutely crushed more often, since it would not be hard to do.

Starting all pods without egress and allowing people to enable it by solving a captcha - would need some UI work inside the pods and unclear if kubernetes even allows us to do this

network policies are evaluated based on immutable properties of pods like labels, so I think we'd have to implement a different mechanism like running our own egress proxy through which all egress traffic unconditionally goes, and then have some runtime check to allow/deny requests to go through. I'm not sure how feasible that is if we allow more than just http(s).

minrk avatar Nov 23 '21 12:11 minrk

Adding a captcha click doesn't really impact that workflow.

That is a good point. I had assumed that people were scripting things. Though it seems like a few clicks isn't a big deal if you hope to mine for "hours". Almost quicker to click than write the script :-/

You think it is worth the effort to build this?

Running our own, dynamically configured egress proxy sounds like "start a whole new open-source project to create the tool we need" in terms of effort :-/

betatim avatar Nov 23 '21 15:11 betatim

You think it is worth the effort to build this?

I honestly don't know! I think it's at least worth the time to investigate how hard it would be and what it might look like.

minrk avatar Nov 23 '21 15:11 minrk

hmmmm - I had also assumed that people were scripting and automating things as well. If we assume people are doing this manually via the UI, could a more lightweight solution be:

  • Every 30 minutes, a little UI window pops up like "still there?"
  • If users don't click it after 5 minutes, Binder shuts their pod down
  • This repeats indefinitely

I feel like this pattern is used across many services already (looking at you Netflix) so it won't be unfamiliar to users. And maybe it will create enough annoyance for the miners to prevent their abuse.

There are two groups of non-abusers that might be negatively affected by this:

  • people that innocently running long-running jobs on mybinder.org, but I'd argue that is not a usecase we want to design for anyway
  • people that want to run a long workshop with Binder, and ask people to take a coffee break and wish to return to their session afterward. For this group I think that we still wouldn't think of this as "in-scope" for Binder's workflows...

choldgraf avatar Nov 23 '21 20:11 choldgraf

I had also assumed that people were scripting and automating things as well.

I definitely could be wrong. My hunch is that we'd see a lot more abuse than we do if more of it was automated. We've definitely seen some automation in the past (identifiable because it launched other binder sessions from within binder).

I think "are you still there" is interesting and could work if everyone were in jupyterlab. I don't know how to make it work for proxied applications, though - RStudio, voilá, X, where we don't have control over what's on the page.

I wouldn't worry about the coffee breaks because the existing idle culler is probably terminating those sessions anyway.

minrk avatar Nov 24 '21 07:11 minrk

I feel like we should shoot for pareto principle here. What are the minimal steps we can take to deter 80% of the abusers? I don't have a good intuition for this, but it feels like the major question to figure out.

Good point re: JupyterLab (and presumably RetroLab?). Though the more cumbersome we can make it to do the crypto mining, the fewer people will likely do it. We'll never get 100%, I think we need to get enough that it is no longer worth the time to manually ban them.

Another idea is that we could have a bot that runs every day via a CRON GitHub Action, it would have the tokens necessary to pull our latest prometheus logs + handle our git-crypt workflow, it would scan those logs for any repositories that are generating a suspicious amount of activity, and then it would automatically open a PR with suggested repositories to ban, similar to the henchbot. That's more like a "assume you can't deter crypto abusers, but reduce the labor associated with banning them" approach.

Also FWIW, I reached out to geoff at gitpod but haven't heard back yet.

choldgraf avatar Nov 24 '21 23:11 choldgraf

To me it sounds like investigating/creating a small prototype to see how it would work with a captcha is worth it. It won't be perfect but it will make it harder for some and/or we learn something.

I think something like a "are you still using this?" popup inside the UI Is going to be more tricky and it is one of my personal pet peeves in terms of UX. Maybe an idea to discuss/shape in a new thread.

Yet another idea (also for a new thread, maybe even on https://github.com/jupyterhub/mybinder.org-deploy/) for dealing with miners: can we fingerprint the mining executables and use a blocklist of md5s(?) to kill Pods that contain/are running executables on that block list?

betatim avatar Nov 25 '21 08:11 betatim

can we fingerprint the mining executables and use a blocklist of md5s(?)

In theory, yes. We'd need to implement locating the executables to hash them (not sure how tricky that is on a host system when dealing with containers). I've no idea how many or varied they are.

minrk avatar Nov 25 '21 19:11 minrk

2. Requiring authentication to launch pods - removes a major benefit of mybinder.orgg which is "no signup, no nothing, just go"

Just to say here that this is REALLY appreciated, thanks binder team :) This is sadly not the norm in "open" science.

ltetrel avatar Feb 15 '22 21:02 ltetrel

Is there an existing solution to completely disable pod egress ? Lot of HPCs are doing that and we would like to do that.

ltetrel avatar Feb 15 '22 22:02 ltetrel

Completely disabling egress for user pods is easy using a K8s networkPolicy.For example mybinder.org already uses a network policy to restrict outgoing ports https://github.com/jupyterhub/mybinder.org-deploy/blob/3420f08c6c8664bb9f52b4b090af599e6bb6b540/mybinder/values.yaml#L46-L60

Restricting egress for selected user pods based on some condition is more complicated.

manics avatar Feb 15 '22 23:02 manics