firecracker icon indicating copy to clipboard operation
firecracker copied to clipboard

jailer: check if exec file exists in the jail dir before copy

Open valighita opened this issue 1 year ago • 7 comments

Changes

Check whether the exec file already exists in the jail dir before copying it.

Reason

While trying to spawn multiple instances in parallel (more than 500), we're seeing contention while the jailer tries to copy the exec file to each of the instances jail dirs (some VMs end up starting in more than 20 seconds). Hard-linking the exec file in the jail dirs before starting the VMs avoids the issue, but the copy operation in the jailer needs to be skipped.

License Acceptance

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check CONTRIBUTING.md.

PR Checklist

  • [ ] If a specific issue led to this PR, this PR closes the issue.
  • [x] The description of changes is clear and encompassing.
  • [x] Any required documentation changes (code and docs) are included in this PR.
  • [ ] API changes follow the Runbook for Firecracker API changes.
  • [ ] User-facing changes are mentioned in CHANGELOG.md.
  • [x] All added/changed functionality is tested.
  • [ ] New TODOs link to an issue.
  • [x] Commits meet contribution quality standards.

  • [ ] This functionality cannot be added in rust-vmm.

valighita avatar Jan 31 '24 10:01 valighita

Hi @valighita , Thank you for your PR!

As the comment in the code says, Jailer does copy instead of hardlink intentionally for a reason of threat model. This can be a breaking change on things that we ensures our customer here.

Also, we have another concern. If the original file has changed since the last copy and the same filename already exists in the jailed environment, Jailer does not copy it.

For these reasons, when this change would be accepted, we would like to make this behavior optional.

To understand your use case, could you please share more details?

  • Could you please give us more details of your use case? Are you reusing a Jailer to run multiple uVMs sequentially? Or just hardlinking the Firecracker binary to different jailed environments to run multiple uVMs in parallel?
  • Is it fine to hardlink the Firecracker binary in multiple jailed environments in your use case from the security perspective?
  • Could you please elaborate "contention" you mentioned? IIUC, the copy operation itself is immutable from the original file perspective and does not take any locks.

Thanks,

zulinx86 avatar Feb 05 '24 12:02 zulinx86

Hi @zulinx86, thank you for looking over the PR.

Regarding your questions:

  1. We're not reusing the environments, just linking the file in advance to multiple jailer dirs in order to avoid the copy.
  2. From the security perspective, it's ok if the Firecracker executable is shared between multiple processes.
  3. We're seeing long start times when running multiple instances in parallel and we've measured the time of various steps. We've noticed that the copy operation was the issue.

To make it optional, do you think it's better to use an env variable or a command line argument?

Thank you!

valighita avatar Feb 05 '24 12:02 valighita

@valighita Thanks for your reply!

From the security perspective, it's ok if the Firecracker executable is shared between multiple processes.

So does it mean your use case is single-tenant and only running trusted code?

We're seeing long start times when running multiple instances in parallel and we've measured the time of various steps. We've noticed that the copy operation was the issue.

That makes sense. So sounds like disk I/O throughput contention or something.

To make it optional, do you think it's better to use an env variable or a command line argument?

A command line argument would be preferable! It would be better to:

  • print a warning message to stderr when using the CLI argument.
  • explain its usage and risk in docs and CHANGELOG. (like, When sharing a Firecracker binary file across multiple protected environments using hardlink, it does not ensure that the new process will not share memory with any other Firecracker process. It also does not ensure that the specified source file and the file already existing in the jailed environment are same. It is highly recommended not to use it when running either multi-tenant or untrusted code, or both.)

Thanks,

zulinx86 avatar Feb 05 '24 13:02 zulinx86

And it would be super nice if you were able to share your performance comparison between with and without this change!!

zulinx86 avatar Feb 05 '24 14:02 zulinx86

Ok. I will submit the changes and the performance comparison when they are ready.

But it's not really clear for us why this can cause a security concern. In theory, only the RO/exec sections of the executable would be shared between different firecracker instances, and any attempt to change the protections to allow writes would lead to a COW operation. This is also true for any shared library (libc for example). Or is this not necessarily a security concern but a policy to not have shared memory?

Thank you, Valentin,

valighita avatar Feb 08 '24 13:02 valighita

But it's not really clear for us why this can cause a security concern. In theory, only the RO/exec sections of the executable would be shared between different firecracker instances, and any attempt to change the protections to allow writes would lead to a COW operation. This is also true for any shared library (libc for example). Or is this not necessarily a security concern but a policy to not have shared memory?

Sorry for my poor explanation. Yes, that is our threat model, a kind of policy. In theory, if everything were working as intended without bugs, that couldn't be a security issue. We're in a stance of defense in depth, since Firecracker is purpose-built for multi-tenant. You may already know, but Firecracker is statically linked and built with musl toolchain by default to not share as much stuff as possible. I appreciate your understanding.

Thanks, Takahiro

zulinx86 avatar Feb 09 '24 10:02 zulinx86

Having another discussion with other maintainers today, we re-confirmed that we wouldn't like to share any memory in production. This is because it could be vulnerable to some micro-architectural side-channel attacks (e.g. cache-timing attacks like FLUSH+RELOAD attack: https://www.usenix.org/conference/usenixsecurity14/technical-sessions/presentation/yarom). Thanks.

zulinx86 avatar Feb 12 '24 15:02 zulinx86

Hello @valighita ,

We hope to find you well. Have you had a chance to look at the previous update?

We don't receive updates for about 5 months and we think it may not be relevant to you anymore. We are closing this PR, but please feel free to reopen it or reach us out if you have any questions or updates.

Thanks for your understanding.

zulinx86 avatar Jul 01 '24 11:07 zulinx86