gramine icon indicating copy to clipboard operation
gramine copied to clipboard

Known issues for production deployment

Open mkow opened this issue 5 years ago • 8 comments

This issue lists items that need to be kept in mind as you consider using Graphene in a production deployment scenario.

Issues: (checked means "already fixed on master")

  • [x] Fix all known security issues
    More information: https://github.com/gramineproject/gramine/issues/8.
  • [ ] Documenting possible misuses of Graphene and its limitations
    Graphene has some limitations (some depends one the backends, e.g. that under SGX you can't get trusted time) and users should be aware of them.
  • [x] Rewriting old, buggy and unstable subsystems:
    • [x] Filesystem (rewrite done, see gramineproject/graphene#1803. There will be more features added in the future, but the core rewrite is done)
    • [x] Signals (see gramineproject/graphene#2090)
    • [x] ELF parsing/loading (the most important part is done, only LibOS loader left, see gramineproject/graphene#1435)
    • [x] Threading (done in gramineproject/graphene#1949)
    • [x] IPC and checkpointing (done, see gramineproject/graphene#2107)
  • [x] Support for upstreamed SGX driver for Linux
    ~Upstreaming is still in progress, we're blocked on this.~ SGX support made its way to Linux 5.11 and Graphene supports it.
  • [x] Removal of Graphene SGX driver (done in gramineproject/graphene#1997)
    This driver is insecure and dangerous (see its README) and is only a temporary solution. We will drop it once FSGSBASE patches are upstreamed (that's the only functionality currently left in the driver).
  • [x] Logging system + consistent output format
    Currently all subsystems output logs in totally random fashion. We also need a better way to control log level.
  • [x] Splitting Graphene output from app output, same for error codes (partially done)
    Currently those are mixed, which makes the output not really useful in production setup. Update: Graphene logs can now be redirected to a separate file in the manifest. App stdout and stderr are currently printed to the same host fd and the error codes are still "ANDed", but that's is probably not a blocker for production deployments.
  • [x] Protected filesystem
    First version almost done, see gramineproject/graphene#1325. Required for most production use-cases.
  • [x] Protected argv and env
    Using argv and environment from the untrusted world may easily lead to TEE compromise. See gramineproject/graphene#508.

mkow avatar May 29 '20 14:05 mkow

Support Basic file locking function support which is required to enable Spark with Graphene #437

debin-yang avatar Sep 28 '20 02:09 debin-yang

@debin-yang This issue is only to aggregate general issues with Graphene which block it from being used in production for all purposes, not just specific workloads.

mkow avatar Sep 28 '20 11:09 mkow

ELF parsing/loading

Isn't this done already? @pwmarcz @mkow .

Support for upstreamed SGX driver for Linux

This is done I think. See https://github.com/oscarlab/graphene/pull/2084 (for Graphene proper) and https://github.com/oscarlab/graphene/pull/2165 (for GSC).

dimakuv avatar Mar 01 '21 09:03 dimakuv

ELF parsing/loading

Here's where we are:

  • [x] LibOS: remove dynamic linking (this simplifies LibOS rtld code greatly, and fixes some bugs)
  • [x] LibOS: refactor or rewrite remaining rtld code (at least get rid of gotos between loops)
  • [x] PAL: either remove dynamic linking (pre-link PAL and LibOS before running), or rewrite it (based on musl)

The problems in PAL code are perhaps less harmful, because it's used only for loading PAL and LibOS binaries. However, I recall running into issues at least once (the relocation code crashing on CFI directives for hardcoded return address).

Somewhat related: fix linking of Graphene binaries to enable use of normal inline and LTO (see gramineproject/graphene#2179).

EDIT: The second checkbox is also done now.

EDIT: @dimakuv rewrote PAL dynamic linking, so we're done here.

pwmarcz avatar Mar 01 '21 10:03 pwmarcz

This is done I think. See gramineproject/graphene#2084 (for Graphene proper) and gramineproject/graphene#2165 (for GSC).

Marked as done.

mkow avatar Mar 01 '21 12:03 mkow

@mkow Do we still want to keep this meta-issue open? There is one item left ("Documenting possible misuses of Graphene and its limitations"), and we don't have immediate plans to write a document like this.

@pwmarcz Could you mark your todo item ("rewrite db_rtld in PAL") as solved, after I submitted my PRs on this?

dimakuv avatar Nov 25 '21 08:11 dimakuv

Done.

pwmarcz avatar Nov 25 '21 10:11 pwmarcz

@dimakuv: I'd keep it and in the meantime try to write up at least a short "secure deployment guidelines" doc, with all the dangers we are aware of clearly listed.

mkow avatar Nov 25 '21 15:11 mkow

Looks like the only thing left is this: Documenting possible misuses of Graphene and its limitations

@mkow Can we consider https://github.com/gramineproject/gramine/pull/1194 as fixing it? If yes, then I can add "Fixes 7" to my PR, and we'll automatically close this issue.

dimakuv avatar Mar 09 '23 14:03 dimakuv

Nope, this one is completely different? Your document is describing current Gramine state and limitations from the compatibility point of view, the one here is about security. Although reading it now I think I should have describe it better... Anyways, I have a draft prepared already, need to finish it up finally.

mkow avatar Mar 13 '23 01:03 mkow