docker icon indicating copy to clipboard operation
docker copied to clipboard

Attempt to catch cases of incorrect datadir ownership

Open orangejulius opened this issue 5 years ago • 13 comments

If the Elasticsearch datadir is not owned by the user specified in $DOCKER_USER, Elasticsearch will fail to start.

Since https://github.com/pelias/docker/pull/55 the helper scripts attempt to remedy this situation if the pelias script is run as root (which is not recommended but is often done).

However, if the pelias script is not run as root and the permissions are incorrect, this situation cannot be automatically fixed.

This code attempts to detect that case and recommend the proper command to run (with sudo) and set proper directory permissions.

Connects https://github.com/pelias/docker/issues/31 Connects https://github.com/pelias/docker/issues/73

orangejulius avatar Mar 05 '19 19:03 orangejulius

Heya,

So backing up a bit, I'm trying to remember why we are doing it like this in the first place?

It seems like an elegant solution is to put this at the top of the pelias executable?

# Detect user and group ids for current user
export DOCKER_USER=$(id -u):$(id -g);

So the $DOCKER_USER would simply default to the IDs of the currently executing user.

In order to support sudo, it could be improved as such:

export DOCKER_USER=$(id -u ${SUDO_USER-${USER}}):$(id -g ${SUDO_USER-${USER}});

This would mean that users couldn't easily override the value with a custom ID/GID value, there are two options for that:

  • Detect if DOCKER_USER is already set, if so, use that - we would have to delete existing references to it in .env files
  • Add a new variable called DOCKER_USER_OVERRIDE.

The nice thing about the script above is, it's super simple, portable and leaves local file permissions set to the executing user by default.

The thing I don't like about the existing solution is that it assumes the UID is 1000 and then tries to force the filesystem to 1000, which makes it hard to delete and edit files on a local filesystem when running as a non-1000 user.

One final consideration is what will happen when someone uses docker-compose directly, bypassing the pelias executable. In this case it will 'just work' with overrides and I believe will pass an empty string otherwise.

edit: yep, it also emits a warning, which is nice WARNING: The DOCKER_USER variable is not set. Defaulting to a blank string.

missinglink avatar Mar 07 '19 08:03 missinglink

I think you're on the right track with reworking the permissions. The important criteria we want to meet, roughly in priority order, are:

  • Docker setup works out of the box across platforms
  • Containers do not run as root
  • Docker setup supports running pelias script as root, even if it is not advised
  • Users of the docker setup can override the UID/GID the containers run as if they desire

So I think we can achieve this by setting DOCKER_USER with the sudo-compatible script you described if the DOCKER_USER variable is unset. Then we can remove the definition of DOCKER_USER from all the project .env files, and they will all default to running as the user who ran the pelias script.

orangejulius avatar Mar 07 '19 14:03 orangejulius

:+1: sounds perfect, the only thing I worry about is existing projects with an .env file containing a DOCKER_USER line

Particularly for users who copied an existing project into a new directory and run OSX, in that case they would probably start experiencing an error, but this might be worth doing anyway since it would be correct for future users.

missinglink avatar Mar 07 '19 15:03 missinglink

If data_dir is owned by curent user and group is users (and DOCKER_USER is not set) then pelias elastic start completes and docker-compose ps shows it 'up' and not restarting all the time. However pelias elastic wait returns

elasticsearch did not come up, check config

This is with docker 17.05.0-ce and docker-compose 1.23.2 on debian jessie 8.8.

pelias compose logs elasticsearch shows:

elasticsearch_1 | [2019-03-17T10:09:36,131][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-state index template elasticsearch_1 | [2019-03-17T10:09:36,179][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-meta index template elasticsearch_1 | [2019-03-17T10:09:36,232][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-notifications index template elasticsearch_1 | [2019-03-17T10:09:36,297][INFO ][o.e.x.m.MachineLearningTemplateRegistry] [zGq41Uw] successfully created .ml-anomalies- index template elasticsearch_1 | [2019-03-17T10:09:36,512][INFO ][o.e.l.LicenseService ] [zGq41Uw] license [ee8c0dc9-35e1-4f42-89a6-83022c114cd0] mode [trial] - valid elasticsearch_1 | [2019-03-17T10:10:51,362][INFO ][o.e.n.Node ] [] initializing ... elasticsearch_1 | [2019-03-17T10:10:51,506][INFO ][o.e.e.NodeEnvironment ] [zGq41Uw] using [1] data paths, mounts [[/usr/share/elasticsearch/data (storage-box5:/data/pelias/pelias_data/elasticsearch)]], net usable_space [156.7tb], net total_space [199.9tb], spins? [possibly], types [nfs4] elasticsearch_1 | [2019-03-17T10:10:51,506][INFO ][o.e.e.NodeEnvironment ] [zGq41Uw] heap size [494.9mb], compressed ordinary object pointers [true] elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] node name [zGq41Uw] derived from node ID [zGq41UwDQCG7SvBDmnRAwA]; set [node.name] to override elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] version[5.6.12], pid[1], build[cfe3d9f/2018-09-10T20:12:43.732Z], OS[Linux/3.16.0-4-amd64/amd64], JVM[Oracle Corporation/OpenJDK 64-Bit Server VM/1.8.0_181/25.181-b13] elasticsearch_1 | [2019-03-17T10:10:51,522][INFO ][o.e.n.Node ] JVM arguments [-Xms2g, -Xmx2g, -XX:+UseConcMarkSweepGC, -XX:CMSInitiatingOccupancyFraction=75, -XX:+UseCMSInitiatingOccupancyOnly, -XX:+AlwaysPreTouch, -Xss1m, -Djava.awt.headless=true, -Dfile.encoding=UTF-8, -Djna.nosys=true, -Djdk.io.permissionsUseCanonicalPath=true, -Dio.netty.noUnsafe=true, -Dio.netty.noKeySetOptimization=true, -Dio.netty.recycler.maxCapacityPerThread=0, -Dlog4j.shutdownHookEnabled=false, -Dlog4j2.disable.jmx=true, -Dlog4j.skipJansi=true, -XX:+HeapDumpOnOutOfMemoryError, -Des.cgroups.hierarchy.override=/, -Xms512m, -Xmx512m, -Des.path.home=/usr/share/elasticsearch] elasticsearch_1 | [2019-03-17T10:10:53,777][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [aggs-matrix-stats] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [ingest-common] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-expression] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-groovy] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-mustache] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [lang-painless] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [parent-join] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [percolator] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [reindex] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [transport-netty3] elasticsearch_1 | [2019-03-17T10:10:53,778][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded module [transport-netty4] elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [analysis-icu] elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [ingest-geoip] elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [ingest-user-agent] elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [repository-s3] elasticsearch_1 | [2019-03-17T10:10:53,779][INFO ][o.e.p.PluginsService ] [zGq41Uw] loaded plugin [x-pack] elasticsearch_1 | [2019-03-17T10:10:55,702][INFO ][o.e.x.m.j.p.l.CppLogMessageHandler] [controller/123] [Main.cc@128] controller (64 bit): Version 5.6.12 (Build 95c8fcf06f45dc) Copyright (c) 2018 Elasticsearch BV elasticsearch_1 | [2019-03-17T10:10:55,746][INFO ][o.e.d.DiscoveryModule ] [zGq41Uw] using discovery type [single-node] elasticsearch_1 | [2019-03-17T10:10:56,426][INFO ][o.e.n.Node ] initialized elasticsearch_1 | [2019-03-17T10:10:56,427][INFO ][o.e.n.Node ] [zGq41Uw] starting ... elasticsearch_1 | [2019-03-17T10:10:56,569][INFO ][o.e.t.TransportService ] [zGq41Uw] publish_address {x.y.z.a:9300}, bound_addresses {0.0.0.0:9300} elasticsearch_1 | [2019-03-17T10:10:56,596][INFO ][o.e.c.s.ClusterService ] [zGq41Uw] new_master {zGq41Uw}{zGq41UwDQCG7SvBDmnRAwA}{CRHPLW5-R9Wmi04uFc8TGQ}{172.18.0.2}{172.18.0.2:9300}{ml.max_open_jobs=10, ml.enabled=true}, reason: single-node-start-initial-join elasticsearch_1 | [2019-03-17T10:10:56,663][INFO ][o.e.h.n.Netty4HttpServerTransport] [zGq41Uw] publish_address {x.y.z.a:9200}, bound_addresses {0.0.0.0:9200} elasticsearch_1 | [2019-03-17T10:10:56,663][INFO ][o.e.n.Node ] [zGq41Uw] started elasticsearch_1 | [2019-03-17T10:10:56,771][INFO ][o.e.l.LicenseService ] [zGq41Uw] license [ee8c0dc9-35e1-4f42-89a6-83022c114cd0] mode [trial] - valid elasticsearch_1 | [2019-03-17T10:10:56,772][INFO ][o.e.g.GatewayService ] [zGq41Uw] recovered [0] indices into cluster_state

jeremy-rutman avatar Mar 17 '19 10:03 jeremy-rutman

changing to user 'deploy' ( uid 1000 ) and chown of data_dir to deploy (group users) , and adding 'deploy' to docker group (sudo gpasswd -a deploy docker) seems to have taken care of everything and pelias elastic wait now now gives the somewhat ambiguous 'waiting for es service to come up' and then exits. pelias elastic status returns the http status 200 which I imagine is good

jeremy-rutman avatar Mar 17 '19 10:03 jeremy-rutman

Okay, I've revived this PR with some minor changes.

While the earlier comment about autodetecting $DOCKER_USER is a great idea, I believe we should solve that separately.

With more time and clarity, here is the purpose of this PR: to detect (and fix) the case where a user has created the $DATA_DIR with root ownership (or really any ownership that doesn't line up with $DOCKER_USER, even though we try to advise against this in our documentation. As seen in #214 and others this will still occasionally happen :)

I've changed the logic so that the script will now try to detect and fix permission issues, running a chown with sudo if necessary.

@missinglink can you review this and let me know what you think?

orangejulius avatar Sep 15 '20 18:09 orangejulius

I just tested this on my Mac where I had an existing project with DOCKER_USER=501 in my .env file and it worked fine. When I changed this to DOCKER_USER=1000 (the default) I got the following error:

pelias elastic start
user 1000 cannot access elasticsearch directory at /data/pelias/docker/projects/portland-metro
please run 'sudo chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch'

missinglink avatar Sep 15 '20 18:09 missinglink

Right, and just to clarify no matter how much we simplify the process of configuring/determining $DOCKER_USER, we will still need a PR like this that checks that all the permissions line up because:

  • users may create directories on their own with incorrect permissions
  • Docker itself will create directories owned by root when a bind mount is configured but the directory doesn't exist

I should be able to test it on my Mac but would love a double check :)

As for group ownership, I don't believe that matters for Pelias and our Docker setup, but some people may care about group ownership. As it stands this PR will run chown $DOCKER_USER $DATA_DIR which would automatically set the group owner if $DOCKER_USER contains one (the format of $DOCKER_USER can be just $UID or $UID:$GID and Docker will happily accept it).

We could either leave things like this, or filter out any group in $DOCKER_USER and only set the user permissions. I think this would potentially be less intrusive, but again it's hard to judge what the most commonly expected behavior would be.

orangejulius avatar Sep 15 '20 18:09 orangejulius

If I switch back to master and use DOCKER_USER=1000 then elasticsearch seems to start up fine 🤷 So it seems weird that it would error in this case?

gid
uid=501(peter) gid=20(staff) groups=20(staff),5(operator),12(everyone),61(localaccounts),79(_appserverusr),80(admin),81(_appserveradm),98(_lpadmin),33(_appstore),100(_lpoperator),204(_developer),250(_analyticsusers),395(com.apple.access_ftp),398(com.apple.access_screensharing),399(com.apple.access_ssh),400(com.apple.access_remote_ae)
cat .env

COMPOSE_PROJECT_NAME=pelias
DATA_DIR=/data/pelias/docker/projects/portland-metro
DOCKER_USER=1000
ls -lah /data/pelias/docker/projects/portland-metro/elasticsearch
total 0
drwxr-xr-x   3 peter  staff    96B Jul 30 15:54 .
drwxr-xr-x  14 peter  staff   448B Jul 29 16:53 ..
drwxrwxr-x   3 peter  staff    96B Jul 30 15:54 nodes

missinglink avatar Sep 15 '20 18:09 missinglink

Try deleting the elasticsearch directory completely and then re-creating it with root ownership. You'll find that pelias elastic start runs but Elasticsearch never comes up, with the same errors as shown in https://github.com/pelias/docker/issues/214#issuecomment-692745815.

Alternatively, try running one of the importers, and I bet they will fail when they try to actually write to Elasticsearch.

orangejulius avatar Sep 15 '20 18:09 orangejulius

It's late for me here, so I'm maybe not explaining well.

I'm :+1: for adding a check for the case where the local directory is not readable by the DOCKER_USER but I don't think this is a reliable method of determining that.

For instance, I can create the DATA_DIR as root and chmod it 777, this code would still error and tell me to change it to sudo chown 1000 /data/pelias/docker/projects/portland-metro/elasticsearch.

And my local user is 501 🤷 there's still a lot of edge cases that can go wrong.

missinglink avatar Sep 15 '20 18:09 missinglink

Maybe time to revisit this, I was thinking we could test -w the directories inside the container to check that the DOCKER_USER has write permissions with something like this?

pelias compose run elasticsearch test -w /usr/share/elasticsearch/data

echo $?
0

missinglink avatar Oct 16 '20 11:10 missinglink

I'm actually hoping this won't be needed in practice any more. I suppose if people ran mkdir ./data (or however they make the data directory) with sudo or as root, they could get into a situation where the processes in the Docker containers don't have permission to write, but I hope that's rare.

Maybe it's wishful thinking, and in that case we can do something like you suggest.

orangejulius avatar Oct 16 '20 13:10 orangejulius