gramine
gramine copied to clipboard
Known issues for production deployment
This issue lists items that need to be kept in mind as you consider using Graphene in a production deployment scenario.
Issues: (checked means "already fixed on master")
- [x] Fix all known security issues
More information: https://github.com/gramineproject/gramine/issues/8. - [ ] Documenting possible misuses of Graphene and its limitations
Graphene has some limitations (some depends one the backends, e.g. that under SGX you can't get trusted time) and users should be aware of them. - [x] Rewriting old, buggy and unstable subsystems:
- [x] Filesystem (rewrite done, see gramineproject/graphene#1803. There will be more features added in the future, but the core rewrite is done)
- [x] Signals (see gramineproject/graphene#2090)
- [x] ELF parsing/loading (the most important part is done, only LibOS loader left, see gramineproject/graphene#1435)
- [x] Threading (done in gramineproject/graphene#1949)
- [x] IPC and checkpointing (done, see gramineproject/graphene#2107)
- [x] Support for upstreamed SGX driver for Linux
~Upstreaming is still in progress, we're blocked on this.~ SGX support made its way to Linux 5.11 and Graphene supports it. - [x] Removal of Graphene SGX driver (done in gramineproject/graphene#1997)
This driver is insecure and dangerous (see its README) and is only a temporary solution. We will drop it once FSGSBASE patches are upstreamed (that's the only functionality currently left in the driver). - [x] Logging system + consistent output format
Currently all subsystems output logs in totally random fashion. We also need a better way to control log level. - [x] Splitting Graphene output from app output, same for error codes (partially done)
Currently those are mixed, which makes the output not really useful in production setup. Update: Graphene logs can now be redirected to a separate file in the manifest. App stdout and stderr are currently printed to the same host fd and the error codes are still "ANDed", but that's is probably not a blocker for production deployments. - [x] Protected filesystem
First version almost done, see gramineproject/graphene#1325. Required for most production use-cases. - [x] Protected argv and env
Using argv and environment from the untrusted world may easily lead to TEE compromise. See gramineproject/graphene#508.
Support Basic file locking function support which is required to enable Spark with Graphene #437
@debin-yang This issue is only to aggregate general issues with Graphene which block it from being used in production for all purposes, not just specific workloads.
ELF parsing/loading
Isn't this done already? @pwmarcz @mkow .
Support for upstreamed SGX driver for Linux
This is done I think. See https://github.com/oscarlab/graphene/pull/2084 (for Graphene proper) and https://github.com/oscarlab/graphene/pull/2165 (for GSC).
ELF parsing/loading
Here's where we are:
- [x] LibOS: remove dynamic linking (this simplifies LibOS rtld code greatly, and fixes some bugs)
- [x] LibOS: refactor or rewrite remaining rtld code (at least get rid of
gotos between loops) - [x] PAL: either remove dynamic linking (pre-link PAL and LibOS before running), or rewrite it (based on musl)
The problems in PAL code are perhaps less harmful, because it's used only for loading PAL and LibOS binaries. However, I recall running into issues at least once (the relocation code crashing on CFI directives for hardcoded return address).
Somewhat related: fix linking of Graphene binaries to enable use of normal inline and LTO (see gramineproject/graphene#2179).
EDIT: The second checkbox is also done now.
EDIT: @dimakuv rewrote PAL dynamic linking, so we're done here.
This is done I think. See gramineproject/graphene#2084 (for Graphene proper) and gramineproject/graphene#2165 (for GSC).
Marked as done.
@mkow Do we still want to keep this meta-issue open? There is one item left ("Documenting possible misuses of Graphene and its limitations"), and we don't have immediate plans to write a document like this.
@pwmarcz Could you mark your todo item ("rewrite db_rtld in PAL") as solved, after I submitted my PRs on this?
Done.
@dimakuv: I'd keep it and in the meantime try to write up at least a short "secure deployment guidelines" doc, with all the dangers we are aware of clearly listed.
Looks like the only thing left is this: Documenting possible misuses of Graphene and its limitations
@mkow Can we consider https://github.com/gramineproject/gramine/pull/1194 as fixing it? If yes, then I can add "Fixes 7" to my PR, and we'll automatically close this issue.
Nope, this one is completely different? Your document is describing current Gramine state and limitations from the compatibility point of view, the one here is about security. Although reading it now I think I should have describe it better... Anyways, I have a draft prepared already, need to finish it up finally.