[RFC] Replace Java Security Manager (JSM)
Is your feature request related to a problem? Please describe.
It has been announced a while ago that SecurityManager is going to be phased out from the JDK. The first step, the deprecation of the SecurityManager (JEP-411), has been landed in JDK 17 and issues the following warnings on OpenSearch builds or server startup:
WARNING: System::setSecurityManager will be removed in a future release
The JDK 18 pushes it even further and now fails on startup (see please https://bugs.openjdk.java.net/browse/JDK-8270380), running OpenSearch builds or server on JDK 18 EA fails with:
Caused by: java.lang.UnsupportedOperationException: The Security Manager is deprecated and will be removed in a future release
at java.base/java.lang.System.setSecurityManager(System.java:416)
It now requires JVM command line option to enable it explicitly using (see please [1]):
-Djava.security.manager=allow
- [x] Support JDK 18 EA builds (https://github.com/opensearch-project/OpenSearch/pull/1710)
Describe the solution you'd like
There is no alternative or replacement for the SecurityManager (to understand why, Project Loom is to "blame"), see please [2]. One of the options is to just drop it, it sounds risky but combined with Plugin Sandbox (see please [3], [4]) it may sounds like a viable option. Other options include (but not limited to): bytecode instrumentation, java agent, custom classloader.
Describe alternatives you've considered We could keep it as long as we can, but once removed from the JDK, it will be a problem.
Additional context
The upcoming JDK-24 release disables SecurityManager permanently [6].
See please links.
[1] https://inside.java/2021/12/06/quality-heads-up/ [2] https://inside.java/2021/04/23/security-and-sandboxing-post-securitymanager/ [3] https://github.com/opensearch-project/OpenSearch/issues/1572 [4] https://github.com/opensearch-project/OpenSearch/issues/1422 [5] A possible JEP to replace SecurityManager after JEP 411 [6] https://github.com/openjdk/jdk/pull/21498
@nknize suggested we remove security manager in 2.0, labelling issue as such - once we have agreed here on what to do for this issue let's open a campaign parent issue in https://github.com/opensearch-project/opensearch-plugins/
@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you
PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.
@dblock would you mind if I submit a small patch for 1.3.x+ so it could be run on JDK 18? Thank you
PS: To clarify why, JDK 18 is scheduled to be released in March, right around 1.4.x (planned) release, I suspect a number of people may give it a try. The change is only adding the command line property, non breaking.
I'm A-OK with anything non-breaking on 1.x.
~~You mean something like adding support to disable the security manager via -Djava.security.manager=disable?~~ (EDIT: I should've read past the first line :) )
I suspect tests will blow up since the test infrastructure leverages a custom SecurityManger via SecureSM. That's going to be more impactful. I'd love some thoughts from @rmuir or @uschindler on this as they are much closer to the JDK security bits than I.
I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.
Lucene uses a custom security manager too, no issues on JDK18. we just initialize it differently than opensearch, right at JVM startup time: -Djava.security.manager=org.apache.lucene.util.TestSecurityManager.
But in your case here, it is a little different because system starts up with no security manager, then parses some config files and maybe does a few evil things on startup, then it installs security manager via System.setSecurityManager(). That's the difference, the deferred initialization. So now for JDK18 you have to set "allow" property for that call to not fail.
Separately, as far as alternatives, I can suggest a few things:
- Keep the SystemCallFilter. This is unrelated to security manager and will stop RCE dead in its tracks, as it disables
fork()/exec()etc completely in an irreversible way. - Look into enhancing the systemd unit to compensate. You can do a lot here, such as allow/block lists of filesystem paths, and more. Recommended introduction. Especially file paths would be great, if you have a directory traversal vulnerability, it is way better to fail with a filesystem error than to transfer some private files. But in addition to file paths, you can also do fancy stuff such as system-call filtering (except for fork/exec which is why you still need to keep part 1), capability drops, etc.
- consider hardening Docker environment too. current entrypoint just runs the shell script, maybe it could instead use the systemd unit, to also benefit from work already done above.
- adjust existing classloader filtering: example. The filtering-classloader currently integrates with security manager, just as a convenient way to provide a list of allowable classes, but it doesn't have to work this way. It can be changed to get its list of allowed classes some other way, and then things like scripting languages at least keep that protection.
I don't recommend directly going the LSM route (AppArmor, SELinux, etc). There's a lot of complexity to those, and its so system-specific which if any are even available. I'd start with systemd which is basically universal now on linux systems, and it gets you the biggest wins anyway (e.g. filtering filesystem and so on).
Another win for stuff like ingest-attachment would be to just run the tika server (separate service/container) and have this plugin call out to it with a REST call. IMO it would be better security for using tika and they provide such a server these days. Then the tika could run in its own stricter separate sandbox.
but that strategy won't work for all the code: There's no one-size/fits-all solution. For example, things like analysis modules/plugins are extremely performance sensitive, and really need to just be passed to IndexWriter. At the same time, these plugins have less security risk (compared to e.g. Tika or scripting languages), so it's not a huge deal: they are just exposing lucene analyzers :)
Thank you very much, @rmuir
I think the issue is written up correctly. You'll want to set -Djava.security.manager=allow from startup scripts (e.g. .bat/.sh), and from gradle when running tests? Otherwise System.setSecurityManager() will fail.
That is right.
I've also made my opinion loudly clear on twitter that removing SecurityManager without replacement is a bad idea for java right now. At least providing a "replacement" first (ideally enabled by default), to help protect server-side apps against the worst vulnerabilities, is really needed. Java is filled with security landmines.
Doubt anything will change on the java side, but I tried. I don't have the resources/energy to write up JEP proposals or anything to try to make real change here though, sorry.
Thanks @rmuir , I think the large part with respect to "what the replacement should be" is still unknown, as it is dictated by Project Loom that is not there yet. But I do 💯 agree on the point: removing SecurityManager without replacement is a bad idea.
if you think of the entire internet (not just opensearch), i really do feel that something similar to the openbsd pledge() api would be at least a minimal replacement. process-wide: drop permissions to fork/exec (RCE), maybe drop network connect() permissions to hosts you don't need, maybe drop permissions to file paths you don't need. In many cases, perhaps the OS can enforce the functionality, in other cases, maybe java needs to do it.
but there's also the separate problem that java includes insecure functionality like JDNI ("landmines"), by default. Besides sandboxing, we need to get good secure defaults here and disable dangerous crap by default. it is a multi-pronged approach.
Do we have a decision on whether OpenSearch will deprecate SecurityManager in a future release or will command line option be used? If it will be deprecated, will there be a replacement? @dblock @nknize @rmuir. thanks,
@Pallavi-AWS the recent (one of many) discussions on OpenJDK mailing list hint there won't be replacements for SecurityManager (very likely, at least) as well as there won't be suitable mechanisms provided for implementing your own. For JDK-18, we explicitly allow SecurityManager but there is no official decision being made on deprecation since no replacement is available.
[1] https://mail.openjdk.java.net/pipermail/security-dev/2022-April/029643.html
i recommend to keep using it until it completely stops working. why would you voluntarily disable a security feature unless you have to?
Do we have a decision on whether OpenSearch will deprecate SecurityManager
It's already deprecated in the jdk and can be found in the build logs: WARNING: System::setSecurityManager will be removed in a future release.
will there be a replacement?
This is still being worked and there are already some great suggestions on this issue. In the meantime, we planned to keep using it until it stops working and will converge on a plan before upgrading to a jdk that removes it completely.
@davidlago Do you have a list of CVEs that were mitigated since the fork by JSM? For example, for the log4j RCE Security Manager + JDK >8 prevented LDAP/RMI connections. I think it will be useful to evaluate any replacement.
Log4j is the big one that comes to mind. I'll take a pass at some of the ones we've seen and see if I can find others. Regardless... it only takes one :) I mean, even just a high severity/critical one averted/mitigated by it is a good reason to not lower our guard there.
Jumping onto this old thread.. Generally speaking, JSM definitely adds to the defence in depth. JSM provided some protection, if not complete, for some of the recent CVEs, log4j being one as you'll mentioned. However, given the deprecation path of JSM and operational overhead of maintaining JSM (as called out in the JEP (https://mail.openjdk.org/pipermail/security-dev/2022-April/029643.html), I liked the alternatives mentioned by @rmuir.
Class loader protection (also mentioned in JEP): We are investigating into this to add better access controls
SystemCallFilter as mentioned by @rmuir again sounds promising
SeLINUX: This is a very powerful tool and can help get much better protection at system level, then JSM. However, given that it works at kernel level, we need to figure out how we enable/provide this in our OpenSearch bundles.
I general, like in Lucene core, we should keep support for SecurityManager/AccessController as long as possible. If it gets "disabled" in JDK (by making everything a NOOP), we do not need to care. If it gets really removed (that may not happen before JDK 21 LTS), we have to deal with that, e.g., using MethodHandles in Lucene's core. In Lucene, my idea is to replace all AccessController#doPrivileged calls by a MethodHandle that is replaced by a noop in recent JDKs. MethodHandles keep the call stack, so the caller-sensitive method is preserving its use. Anothe ridea would be a functional interface.
no need for methodhandles or any crazy shit like that. The JEP tells you that they won't remove stuff in this way, they will make things no-ops: read it.
Please read what I said: "If it gets really removed (that may not happen before JDK 21 LTS), we have to deal with that, e.g., using MethodHandles in Lucene's core"
Thanks.
I general, like in Lucene core, we should keep support for SecurityManager/AccessController as long as possible.
My question is still why? That's why I asked @davidlago of what actual CVEs JSM mitigated. There's some effort to "move security into core", with that, @rmuir @uschindler what is your argument for not removing JSM?
I was talking about Lucene. We can't remove the AccessController.doPrivileged blocks, as otherwise downstream projects like yours won't be able to differentiate between actions triggered by Lucene or malicious code. So Lucene will remove our parts later. We don't do security, we just provide the plugin points like any other Java library should have done, too (there are some which ignored AccessController, but most of them added correct doPrivileged blocks).
So in short: Lucene will keep its blocks forever (although they may get noops). Most other libraries do the same. At moment where java 11 and java 17 are still used in wide range, please, please leave current code enabled. The "why" was explained before.
Some additional idea that came to me in summer while talking on conferences, improving the class loader / instrumentation variant: Forbiddenapis may be used as JVM agent (or for plugins/scripts in their classloader). If there's need, the Forbiddenapis signature files could be packaged with an application and through bytecode instrumentation it could deny loading classes (also bytecode generated at runtime) that has specific signatures to JNDI or Java Serialization. I think that can be done easily with some code around Forbiddenapis to invoke Checker.
My question is still why? That's why I asked @davidlago of what actual CVEs JSM mitigated. There's some effort to "move security into core", with that, @rmuir @uschindler what is your argument for not removing JSM?
Because currently I don't see replacements for a lot of the functionality. Meanwhile the protection still works so why discard the only security mechanism that you have? I think I explained above, but to summarize for specific vulnerabilities that are of concern (e.g. have happened before), in a world without a security manager, I think the easiest win is to harden the systemd service, Currently it is very weak and insecure: https://github.com/opensearch-project/OpenSearch/blob/main/distribution/packages/src/common/systemd/opensearch.service
These are some of the historically problematic issues that the security manager prevents... aka the worst-of-the-worst:
- RCE: there's some protection by seccomp etc via systemcallfilter which disables fork/exec at OS level. So even without security manager, there is at least basic protection from allowing someone to execute coin miner or whatever. This piece has to remain as you can't block fork/exec with systemd seccomp filters... But I'd still recommend to start hardening the systemd service with
SystemCallFilter=to take it further (the seccomp bpf rules will "nest" just fine). - File/Directory traversal et al: currently the security manager is the only thing restricting the filesystem. opensearch does not need to be able to access files in users home directories or anything like that. If there is a bug in the code that would allow this, instead of accessing users private files, security manager will deliver a SecurityException. but, alternatively, the directories that can be read and written can be nicely restricted with systemd service as well (stuff like
ReadWritePaths=)
I recommend looking at a secure systemd service as an example, and comparing it to the current one, here is a good one: https://github.com/archlinux/svntogit-community/blob/packages/dnscrypt-proxy/trunk/dnscrypt-proxy.service
You may also use systemd-analyze security opensearch.service to track your progress, it will suggest improvements.
I dropped in the current opensearch.service and analyzed it so you can see what I mean:
→ Overall exposure level for opensearch.service: 8.8 EXPOSED 🙁 Full output: opensearch-analysis.txt
Compare this to e.g. my nginx.service:
→ Overall exposure level for nginx.service: 0.9 SAFE 😀
Full output:
nginx.analysis.txt
Thanks @rmuir, hardening systemd would make sense in any case, w/o JSM, may be you could create an issue for that for OpenSearch? We also have to support Windows and other distributions / platforms where systemd is not available, so it leaves us mostly with JSM only (for now at least).
We also have to support Windows and other distributions / platforms where systemd is not available, so it leaves us mostly with JSM only (for now at least).
Are you sure about that? https://devblogs.microsoft.com/commandline/systemd-support-is-now-available-in-wsl/
We also have to support Windows and other distributions / platforms where systemd is not available, so it leaves us mostly with JSM only (for now at least).
Are you sure about that? https://devblogs.microsoft.com/commandline/systemd-support-is-now-available-in-wsl/
I mean WSL technically is not Windows native deployment model, plus not everyone keen on using systemd even when it is supported (I am not devops guy, but I frequently have seen debates on the matter).
We also have to support Windows and other distributions / platforms where systemd is not available, so it leaves us mostly with JSM only (for now at least).
Are you sure about that? https://devblogs.microsoft.com/commandline/systemd-support-is-now-available-in-wsl/
I mean WSL technically is not Windows native deployment model, plus not everyone keen on using
systemdeven when it is supported (I am not devops guy, but I frequently have seen debates on the matter).
The problem is that systemd is only supported in the WSL2, so actually you need to run Opensearch as a Ubuntu Linux application inside WSL's Ubuntu distribution (there may be other distributions, but that's the default one shipped by Microsoft). There's no native support in Windows Win32 NT kernel subsystem. So running Opensearch on Windows with Java for Windows as Windows Service can't use it.
Well, java announced it is dropping its os-independent sandboxing tool. And you guys aggressively want to remove it without nothing in place and are asking me to defend why you wouldnt do that?
It is clear you haven't thought this through. I think its enough to just run well on linux and then run as container on mac/windows. Probably controversial but easy way to reduce complexity and testing.
I've been doing this stuff a long time and never actually seen any serious server running on windows.
You'll have to change something, if you want to keep it secure, seems like the right tradeoff to me.