binderhub
binderhub copied to clipboard
Add config option to disable network egress for pods launched w/o a captcha solution
Proposed change
Add a config option to allow BinderHub administrators to turn off network egress for pods launched without a solution to a captcha.
We propose to add an optional parameter to launch requests which contains the solution to a captcha (proof of human-ness). When a valid solution is submitted the Pod is launched with network egress, if no valid solution is submitted the pod is launched without egress.
Allowing launch requests w/o a captcha means we don't break API users like thebe or juniper or people using the API. The hypothesis is that most of those using these tools do not require network egress. If they do we need to investigate if it is possible to obtain a solution to the mybinder.org captcha from a page hosted at example.com or the command line. After some research I am not sure what the answer is. A potential avenue might be https://www.hcaptcha.com/accessibility
Limiting the amount of disruption and abuse from miners is tricky to balance with keeping the service easy to use for legitimate users. Removing network egress seems like a good trade off, especially if we find a way for API users to obtain captcha solutions.
Alternative options
Some alternative ideas we have had:
- Limiting launch requests based on source IP - difficult for courses and such who are behind a proxy so that all users appear to have the same IP
- Requiring authentication to launch pods - removes a major benefit of mybinder.orgg which is "no signup, no nothing, just go"
- Starting all pods without egress and allowing people to enable it by solving a captcha - would need some UI work inside the pods and unclear if kubernetes even allows us to do this
- Using "proof of work" (POW) to launch a pod - if launching a new pod is delayed by 20s or so required to solve the POW challenge, that does not really deter abusers who want to use hours of CPU time but is a major slowdown for normal users.
Who would use this feature?
This would benefit operators of large BinderHubs as it reduces the amount of time and effort spent on detecting and blocking abuse. In turn this benefits the users of large BinderHubs because they don't have to share resources with abusers or live with more restrictive resource allocations aimed at making the BinderHub a less attractive target for abusers.
(Optional): Suggest a solution
Add a widget from a service like https://www.hcaptcha.com/ to our launch form and launch pages. Submit the solution with the launch request as an additional query parameter. Check the solution server side and make a decision on the Pod's network egress configuration.
This idea came out of a discussion with @choldgraf on the twitters (sometimes all the social media is a benefit to humanity after all ;) )
I think this is probably a good idea, though it wouldn't significantly deter abusers who are manually starting sessions to kick-off mining runs. Obviously hard to say for sure, but I suspect a significant fraction are doing by hand in a browser:
- launch binder
- open terminal
- paste kick-off script
- repeat
Adding a captcha click doesn't really impact that workflow. If more folks were scripting mining abuse, I'd expect the cluster to get absolutely crushed more often, since it would not be hard to do.
Starting all pods without egress and allowing people to enable it by solving a captcha - would need some UI work inside the pods and unclear if kubernetes even allows us to do this
network policies are evaluated based on immutable properties of pods like labels, so I think we'd have to implement a different mechanism like running our own egress proxy through which all egress traffic unconditionally goes, and then have some runtime check to allow/deny requests to go through. I'm not sure how feasible that is if we allow more than just http(s).
Adding a captcha click doesn't really impact that workflow.
That is a good point. I had assumed that people were scripting things. Though it seems like a few clicks isn't a big deal if you hope to mine for "hours". Almost quicker to click than write the script :-/
You think it is worth the effort to build this?
Running our own, dynamically configured egress proxy sounds like "start a whole new open-source project to create the tool we need" in terms of effort :-/
You think it is worth the effort to build this?
I honestly don't know! I think it's at least worth the time to investigate how hard it would be and what it might look like.
hmmmm - I had also assumed that people were scripting and automating things as well. If we assume people are doing this manually via the UI, could a more lightweight solution be:
- Every 30 minutes, a little UI window pops up like "still there?"
- If users don't click it after 5 minutes, Binder shuts their pod down
- This repeats indefinitely
I feel like this pattern is used across many services already (looking at you Netflix) so it won't be unfamiliar to users. And maybe it will create enough annoyance for the miners to prevent their abuse.
There are two groups of non-abusers that might be negatively affected by this:
- people that innocently running long-running jobs on mybinder.org, but I'd argue that is not a usecase we want to design for anyway
- people that want to run a long workshop with Binder, and ask people to take a coffee break and wish to return to their session afterward. For this group I think that we still wouldn't think of this as "in-scope" for Binder's workflows...
I had also assumed that people were scripting and automating things as well.
I definitely could be wrong. My hunch is that we'd see a lot more abuse than we do if more of it was automated. We've definitely seen some automation in the past (identifiable because it launched other binder sessions from within binder).
I think "are you still there" is interesting and could work if everyone were in jupyterlab. I don't know how to make it work for proxied applications, though - RStudio, voilá, X, where we don't have control over what's on the page.
I wouldn't worry about the coffee breaks because the existing idle culler is probably terminating those sessions anyway.
I feel like we should shoot for pareto principle here. What are the minimal steps we can take to deter 80% of the abusers? I don't have a good intuition for this, but it feels like the major question to figure out.
Good point re: JupyterLab (and presumably RetroLab?). Though the more cumbersome we can make it to do the crypto mining, the fewer people will likely do it. We'll never get 100%, I think we need to get enough that it is no longer worth the time to manually ban them.
Another idea is that we could have a bot that runs every day via a CRON GitHub Action, it would have the tokens necessary to pull our latest prometheus logs + handle our git-crypt workflow, it would scan those logs for any repositories that are generating a suspicious amount of activity, and then it would automatically open a PR with suggested repositories to ban, similar to the henchbot. That's more like a "assume you can't deter crypto abusers, but reduce the labor associated with banning them" approach.
Also FWIW, I reached out to geoff at gitpod but haven't heard back yet.
To me it sounds like investigating/creating a small prototype to see how it would work with a captcha is worth it. It won't be perfect but it will make it harder for some and/or we learn something.
I think something like a "are you still using this?" popup inside the UI Is going to be more tricky and it is one of my personal pet peeves in terms of UX. Maybe an idea to discuss/shape in a new thread.
Yet another idea (also for a new thread, maybe even on https://github.com/jupyterhub/mybinder.org-deploy/) for dealing with miners: can we fingerprint the mining executables and use a blocklist of md5s(?) to kill Pods that contain/are running executables on that block list?
can we fingerprint the mining executables and use a blocklist of md5s(?)
In theory, yes. We'd need to implement locating the executables to hash them (not sure how tricky that is on a host system when dealing with containers). I've no idea how many or varied they are.
2. Requiring authentication to launch pods - removes a major benefit of mybinder.orgg which is "no signup, no nothing, just go"
Just to say here that this is REALLY appreciated, thanks binder team :) This is sadly not the norm in "open" science.
Is there an existing solution to completely disable pod egress ? Lot of HPCs are doing that and we would like to do that.
Completely disabling egress for user pods is easy using a K8s networkPolicy.For example mybinder.org already uses a network policy to restrict outgoing ports https://github.com/jupyterhub/mybinder.org-deploy/blob/3420f08c6c8664bb9f52b4b090af599e6bb6b540/mybinder/values.yaml#L46-L60
Restricting egress for selected user pods based on some condition is more complicated.