vic
vic copied to clipboard
Container needs bootstrap values changed Bugref: 1957268
Story As a user of a VCH I need to be able to adjust Linux kernel parameters if necessary for a specific workload.
Detail This issue was originally opened specifically to address the max_map_count setting needed for ElasticSearch (which has been mitigated for now with #7790), however it's a general requirement.
This issue should add support for the --sysctl
option. This entails:
- [ ] tether update to unpack the x.y.z form into the /proc/ path and apply the change - I would like this phrased as an extension rather than inlined into current tether code.
- [ ] cVM configuration update to pass the config
- [ ] portlayer update to take these options as part of container create (I would like this phrased as a config blob, with a validator that is associated with the cVM bootstrap chosen for the container)
- [ ] docker personality to unpack the config blob and marshal for portlayer
Related
#5353 is to add --ulimit
support
Original Using VIC 1.1.1
Elasticsearch container needs following values changed in the bootstrap to work properly. ERROR: [2] bootstrap checks failed [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536] [2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
Bugref: 1957268
@dbarkelew looks like [1] is already fixed in 1.2, but it can also be set via sysctl -w fs.file-max=<whatever is needed>
for 1.1.1. [2] I suspect can also be set via sysctl -w vm.max_map_count=<whatever is needed>
. As for running these commands when the container is launched with a different user. something like su -c "sysctl -w vm.max_map_count=<whatever is needed>"
(same for the FDs) will likely work. I will be working on a test with this locally.
I can identify two classes of workaround that do not require root or su in the image:
- use an suid binary
- directly inject a second command into the container
The simplest of (2) that I can think of is to use exec
:
$ docker run -dit --user=999 --name=sysctl-test alpine /bin/ash -c 'until [ $(cat /proc/sys/vm/max_map_count) -eq 262144 ]; do echo .;sleep 1;done;echo ready to roll'
8af98b316e7c0a2a11936a74c054bc7fbfd22b41e20bb31c4e2666ead1e62349
$ docker exec -it sysctl-test /bin/ash -c "echo 262144 > /proc/sys/vm/max_map_count"
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ docker logs sysctl-test
.
.
.
.
.
.
ready to roll
$
This has the drawback that a second command is needed from the client side, but the positive that it's extremely easy to do with a simple wait in an entry script.
Solutions that do not require a second client command will all require an suid binary until we add explicit support for configuring specific /proc
values
If the gating factor about using root is elasticsearch, not process obligations, you can use an suid bash
in the image (ash
will not work as it doesn't have the -p
flag).
THIS EXAMPLE USES SUID BASH AND SECURITY IMPACT SHOULD BE CONSIDERED IF UTILIZED DIRECTLY
$ docker run -it --name sysctl-test debian
root@623da673b76a:/#
root@623da673b76a:/# cp /bin/bash /tmp/bash
root@623da673b76a:/# chmod u+s /tmp/bash
root@623da673b76a:/# adduser testuser
Adding user `testuser' ...
<snip>
root@623da673b76a:/# su - testuser
testuser@623da673b76a:~$
testuser@623da673b76a:~$ /tmp/bash -c "id"
uid=1000(testuser) gid=1000(testuser) groups=1000(testuser)
testuser@623da673b76a:~$ /tmp/bash -p -c "id"
uid=1000(testuser) gid=1000(testuser) euid=0(root) groups=1000(testuser)
testuser@623da673b76a:~$ /tmp/bash -p -c "echo 262144 > /proc/sys/vm/max_map_count"
testuser@623da673b76a:~$ cat /proc/sys/vm/max_map_count
262144
testuser@623da673b76a:~$
As you can see from the example the -p
is needed to avoid dropping the EID back to testuser.
A security conscious variant of this would be a very small binary that does only that which is explicitly needed - can even hardcode the max_map_count value needed for total rigidity.
At the very least, this will need to be a documentation task @stuclem . I suggest we add a section on configuring the cVM OS and add all of the common requirements - swap space, hostname etc. Some of these will have proper support and others may need some workarounds. We can update the doc as we improve support.
I can also predict that before long a customer will ask to be able to set custom values in the VMX of cVMs deployed. I heard that from customers 2 years ago.
@hickeng I've created an epic https://github.com/vmware/vic/issues/6418 to cover all cVM guest config tasks. This fits exactly into the kinds of bumps we've committed to fixing through the rest of the year.
adding to 1.3 and making high priority per @pdaigle
Also reported on the vmware-code slack instance. So +1 on customer found/impacting.
Doc aspect is tracked in https://github.com/vmware/vic-product/issues/869. Removing kind/user-doc from this one.
I have an email for another impacted customer.
@hickeng Moving to In Progress since https://github.com/vmware/vic/pull/7790 is open.
Raising to P1 and including in 1.5