vic icon indicating copy to clipboard operation
vic copied to clipboard

Container needs bootstrap values changed Bugref: 1957268

Open dbarkelew opened this issue 7 years ago • 11 comments

Story As a user of a VCH I need to be able to adjust Linux kernel parameters if necessary for a specific workload.

Detail This issue was originally opened specifically to address the max_map_count setting needed for ElasticSearch (which has been mitigated for now with #7790), however it's a general requirement.

This issue should add support for the --sysctl option. This entails:

  • [ ] tether update to unpack the x.y.z form into the /proc/ path and apply the change - I would like this phrased as an extension rather than inlined into current tether code.
  • [ ] cVM configuration update to pass the config
  • [ ] portlayer update to take these options as part of container create (I would like this phrased as a config blob, with a validator that is associated with the cVM bootstrap chosen for the container)
  • [ ] docker personality to unpack the config blob and marshal for portlayer

Related #5353 is to add --ulimit support

Original Using VIC 1.1.1

Elasticsearch container needs following values changed in the bootstrap to work properly. ERROR: [2] bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

Bugref: 1957268

dbarkelew avatar Sep 11 '17 22:09 dbarkelew

@dbarkelew looks like [1] is already fixed in 1.2, but it can also be set via sysctl -w fs.file-max=<whatever is needed> for 1.1.1. [2] I suspect can also be set via sysctl -w vm.max_map_count=<whatever is needed>. As for running these commands when the container is launched with a different user. something like su -c "sysctl -w vm.max_map_count=<whatever is needed>"(same for the FDs) will likely work. I will be working on a test with this locally.

matthewavery avatar Sep 12 '17 03:09 matthewavery

I can identify two classes of workaround that do not require root or su in the image:

  1. use an suid binary
  2. directly inject a second command into the container

The simplest of (2) that I can think of is to use exec:

$ docker run -dit --user=999 --name=sysctl-test alpine /bin/ash -c 'until [ $(cat /proc/sys/vm/max_map_count) -eq 262144 ]; do echo .;sleep 1;done;echo ready to roll'
8af98b316e7c0a2a11936a74c054bc7fbfd22b41e20bb31c4e2666ead1e62349
$ docker exec -it sysctl-test /bin/ash -c "echo 262144 > /proc/sys/vm/max_map_count"
$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
$ docker logs sysctl-test
.
.
.
.
.
.
ready to roll
$

This has the drawback that a second command is needed from the client side, but the positive that it's extremely easy to do with a simple wait in an entry script. Solutions that do not require a second client command will all require an suid binary until we add explicit support for configuring specific /proc values

hickeng avatar Sep 15 '17 04:09 hickeng

If the gating factor about using root is elasticsearch, not process obligations, you can use an suid bash in the image (ash will not work as it doesn't have the -p flag). THIS EXAMPLE USES SUID BASH AND SECURITY IMPACT SHOULD BE CONSIDERED IF UTILIZED DIRECTLY

$ docker run -it --name sysctl-test debian
root@623da673b76a:/#
root@623da673b76a:/# cp /bin/bash /tmp/bash
root@623da673b76a:/# chmod u+s /tmp/bash
root@623da673b76a:/# adduser testuser
Adding user `testuser' ...
<snip>
root@623da673b76a:/# su - testuser
testuser@623da673b76a:~$
testuser@623da673b76a:~$ /tmp/bash -c "id"
uid=1000(testuser) gid=1000(testuser) groups=1000(testuser)
testuser@623da673b76a:~$ /tmp/bash -p -c "id"
uid=1000(testuser) gid=1000(testuser) euid=0(root) groups=1000(testuser)
testuser@623da673b76a:~$ /tmp/bash -p -c "echo 262144 > /proc/sys/vm/max_map_count"
testuser@623da673b76a:~$ cat /proc/sys/vm/max_map_count
262144
testuser@623da673b76a:~$

As you can see from the example the -p is needed to avoid dropping the EID back to testuser.

A security conscious variant of this would be a very small binary that does only that which is explicitly needed - can even hardcode the max_map_count value needed for total rigidity.

hickeng avatar Sep 15 '17 05:09 hickeng

At the very least, this will need to be a documentation task @stuclem . I suggest we add a section on configuring the cVM OS and add all of the common requirements - swap space, hostname etc. Some of these will have proper support and others may need some workarounds. We can update the doc as we improve support.

I can also predict that before long a customer will ask to be able to set custom values in the VMX of cVMs deployed. I heard that from customers 2 years ago.

corrieb avatar Sep 23 '17 12:09 corrieb

@hickeng I've created an epic https://github.com/vmware/vic/issues/6418 to cover all cVM guest config tasks. This fits exactly into the kinds of bumps we've committed to fixing through the rest of the year.

corrieb avatar Sep 23 '17 12:09 corrieb

adding to 1.3 and making high priority per @pdaigle

mdubya66 avatar Oct 06 '17 15:10 mdubya66

Also reported on the vmware-code slack instance. So +1 on customer found/impacting.

mlh78750 avatar Nov 15 '17 19:11 mlh78750

Doc aspect is tracked in https://github.com/vmware/vic-product/issues/869. Removing kind/user-doc from this one.

stuclem avatar Jan 11 '18 09:01 stuclem

I have an email for another impacted customer.

dbarkelew avatar Mar 15 '18 18:03 dbarkelew

@hickeng Moving to In Progress since https://github.com/vmware/vic/pull/7790 is open.

anchal-agrawal avatar Apr 26 '18 19:04 anchal-agrawal

Raising to P1 and including in 1.5

mdubya66 avatar May 31 '18 17:05 mdubya66