for-aws icon indicating copy to clipboard operation
for-aws copied to clipboard

piecemeal behavior

Open xenoterracide opened this issue 7 years ago • 17 comments

Expected behavior

be able to consume just the ebs volume mounting

Actual behavior

requires that I set up a lot of access I don't want to give/trust

Information

It seems currently that the only way to use for-aws is by setting it up with cloudformation, and a lot of other services. I really only want 2 things 1, the relocatable ebs volumes, and 2 some tool/documentation around having an autoscaling swarm where you don't know the ips ahead of time. I don't trust other peoples AMIs host management (really I just don't trust that security updates will be appropriately managed on the host os), I don't want to use cloudformation (using terraform). If any of this is already possible, I'd like to see additional docs for it https://docs.docker.com/docker-for-aws/persistent-data-volumes/#relocatable-cloudstor-volumes

xenoterracide avatar Dec 11 '17 23:12 xenoterracide

You can consume EBS volumes by just installing the cloudstor plugin in your AWS nodes setup with terraform. You can install the plugin using:

docker plugin install --alias cloudstor:aws --grant-all-permissions docker4x/cloudstor:17.06.2-ee-5-aws1 CLOUD_PLATFORM=AWS AWS_REGION=[region] AWS_STACK_ID=[swarmid] EFS_SUPPORTED=0 DEBUG=1

region needs to be specified in the format us-east-1 or any other AWS region. swarmid can be any unique string like "myswarm1" within your AWS subscription.

ddebroy avatar Dec 11 '17 23:12 ddebroy

but does that stack have to be valid? will I have to give permissions to cloudformation? and if not why is it necessary.

xenoterracide avatar Dec 12 '17 00:12 xenoterracide

The AWS_STACK_ID can be any string that you use to identify all nodes in a particular cluster. It does not have to be associated with something or be valid in any sense - it is essentially a tag.

You do need to configure the following IAM roles for each nodes:

"Action": [
    "ec2:CreateTags",
    "ec2:AttachVolume",
    "ec2:DetachVolume",
    "ec2:CreateVolume",
    "ec2:DeleteVolume",
    "ec2:DescribeVolumes",
    "ec2:DescribeVolumeStatus",
    "ec2:CreateSnapshot",
    "ec2:DeleteSnapshot",
    "ec2:DescribeSnapshots"
],
"Effect": "Allow",
"Resource": "*"

ddebroy avatar Dec 12 '17 00:12 ddebroy

hmm... ok then, maybe this should be better documented, just adding the command and the perms for that is probably good, it actually took me quite a long time to find even a half working command to try to install the plugin.

oh, and does this autoformat volumes if it can create/delete them?

My one other question on this, having read https://docs.docker.com/docker-for-aws/scaling/#aws-console I'm currently imaginging there is some middleware to make multiple manager nodes register with each other, without knowing the ip, is that as easy? or is that cloudformation?

xenoterracide avatar Dec 12 '17 00:12 xenoterracide

The swarm cluster creation is currently done through utility containers configured to run on each container through the cloudformation. However it should be possible to replicate that logic with a few scripts if you want to roll your own and pass those in through the terraform templates to run on each of your nodes.

ddebroy avatar Dec 12 '17 00:12 ddebroy

hi @xenoterracide, I totally agree with you using terraform instead relying in the CF template. In fact, I've recently replicated the entire cloudformation template using terraform to be able to control and customise security groups, make it work with an alb instead of an elb (as well as sharing a single alb and efs between several swarm clusters) and other nice things. It's working great so far, I'm currently documenting it and will share it publicly as soon as I get approval from my employer. Just wanted to confirm that as @ddebroy says, docker for aws doesn't need the stack id to be a valid CF stack to work. What you'll need is a dynamodb table, and two SQS queues, along with the policies to allow the swarm nodes to use them. The swarm nodes join the swarm upon creation by looking up the swarm leader manager's IP at the dynamodb table, and they request a token from it. If they don't find any IP there, they assume they are the first one and they become the leader of a new swarm. The dynamodb table name to be used is passed as an env variable in the instance user data. The leader manager then grants a token only to instances that are members of the appropriate IAM role (I think). All this is controlled by a script that runs at boot inside a container called init-aws. You can find its logs by runing docker ps -a after a node boot and then docker logs container_id. The queues are mostly use for node deregistration when they're being terminated, and that's managed by the guide-aws containers.

All this is not really documented AFAIK but who doesn't like a bit of reverse engineering ;)

adrissss avatar Dec 12 '17 00:12 adrissss

also noticing as I go to see how well using this works today, that compose file syntax isn't really documented here, I don't think that'll be too hard to figure out, but still might be nice to have documented.

xenoterracide avatar Dec 12 '17 17:12 xenoterracide

You mean this? https://docs.docker.com/compose/compose-file/

adrissss avatar Dec 12 '17 18:12 adrissss

not exactly, I guess I meant more specifically that there isn't an example of using this driver with with compose. Took me 3 tries? to get the syntax right, since my existing volumes weren't specifying drivers, and especially not driver_opts.

xenoterracide avatar Dec 12 '17 20:12 xenoterracide

got you, driver's configuration and options... yes, poorly documented to say the least, I've never used it with EBS volumes, just EFS.

If you can share a working config with EBS it would be great

adrissss avatar Dec 12 '17 20:12 adrissss

redacting, looks like it was just taking it's time (much much longer than my cloudinit/udev script)

xenoterracide avatar Dec 12 '17 20:12 xenoterracide

as far as I can tell it is not possible to specify the ebs volume that you want to attach? is that correct? and if I'm not using the dynamo db, how does it know which ebs volume to restore? does relocation still work?

xenoterracide avatar Dec 12 '17 21:12 xenoterracide

awesome, these are my logs from sonatype/nexus

d Dec 13 01:51:07 UTC 2017
[email protected]    | 02:19:12,757 |-INFO in c.q.l.core.rolling.DefaultTimeBasedFileNamingAndTriggeringPolicy - Setting initial period to Wed Dec 13 02:19:12 UTC 2017
[email protected]    | 02:19:12,759 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - Active log file name: appender.file_IS_UNDEFINED
[email protected]    | 01:51:07,168 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - Active log file name: appender.file_IS_UNDEFINED
[email protected]    | 01:51:07,168 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - File property is set to [appender.file_IS_UNDEFINED]
[email protected]    | 02:19:12,759 |-INFO in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - File property is set to [appender.file_IS_UNDEFINED]
[email protected]    | 02:19:12,782 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - openFile(appender.file_IS_UNDEFINED,true) call failed. java.io.FileNotFoundException: appender.file_IS_UNDEFINED (Permission denied)
[email protected]    | 01:51:07,170 |-ERROR in ch.qos.logback.core.rolling.RollingFileAppender[logfile] - openFile(appender.file_IS_UNDEFINED,true) call failed. java.io.FileNotFoundException: appender.file_IS_UNDEFINED (Permission denied)
[email protected]    | 	at java.io.FileNotFoundException: appender.file_IS_UNDEFINED (Permission denied)
[email protected]    | 	at java.io.FileNotFoundException: appender.file_IS_UNDEFINED (Permission denied)

and here's why I think that's happening, the drives have been mounted as read only

[root@ip-192-169-0-202 ~]# mount | grep xvd
/dev/xvda1 on / type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/xvda1 on /var/lib/docker/plugins type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/xvda1 on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/propagated-mount type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/xvda1 on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/rootfs/mnt type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/xvda1 on /var/lib/docker/overlay type xfs (rw,relatime,seclabel,attr2,inode64,noquota)
/dev/xvdf on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/rootfs/mnt/ebs/build_jenkins type ext4 (ro,relatime,seclabel,data=ordered)
/dev/xvdf on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/propagated-mount/ebs/build_jenkins type ext4 (ro,relatime,seclabel,data=ordered)
/dev/xvdg on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/rootfs/mnt/ebs/build_nexus type ext4 (ro,relatime,seclabel,data=ordered)
/dev/xvdg on /var/lib/docker/plugins/8f0bc9bdeb34f33ece3e8515bf051abe8e047d5019d74b70ed90cc1fbd8c9fa7/propagated-mount/ebs/build_nexus type ext4 (ro,relatime,seclabel,data=ordered)

here's the compose file

version: "3"
services:
  nexus:
    image: sonatype/nexus:latest
    ports:
    - "8081:8081"
    volumes:
    - "nexus:/sonatype-work:Z"
    networks:
      cluster:
        aliases:
        - nexus
    environment:
        JAVA_MIN_HEAP: 100m
        JAVA_MAX_HEAP: 500m
        CONTEXT_PATH: /
    deploy:
        resources:
            limits:
              memory: 600M
            reservations:
              memory: 500M
  jenkins:
    image: xenoterracide/dex-docker:jenkins-lts-alpine-docker
    ports:
    - "8080:8080"
    environment:
        JAVA_OPTS: "-Djava.awt.headless=true"
        HOST_DOCKER_GID: 994
    volumes:
    - "jenkins:/var/jenkins_home:Z"
    - "/var/run/docker.sock:/var/run/docker.sock"
    networks:
        cluster:
            aliases:
            - jenkins
volumes:
    nexus:
        driver: cloudstor:aws
        driver_opts:
            size: 2
    jenkins:
        driver: cloudstor:aws
        driver_opts:
            size: 10
networks:
    cluster:

and the code that sets it up

docker plugin install --alias cloudstor:aws --grant-all-permissions \
    docker4x/cloudstor:$(docker version --format '{{.Server.Version}}')-aws1 \
    CLOUD_PLATFORM=AWS AWS_REGION=us-east-1 AWS_STACK_ID=build EFS_SUPPORTED=0 DEBUG=1
docker stack deploy --compose-file /tmp/docker-compose.yml build

xenoterracide avatar Dec 13 '17 15:12 xenoterracide

@adrissss Any updates from you regarding the terraform updates. It would be great if you can give me some hints. I did port the whole CloudFormation script of 18.03.0-ce-aws1 with some changes to the security group, asg and mainly the VPC is changed to 172.51.0.0/16.

I started without stack-name and stack-id and in all machines, init-aws had the following error "Invalid WaitConditionHandle URL specified" but exited with error code 0. I entered a dummy stack-name and dummy stack-id, even here the init-aws container exited with 0 zero error code but contained "ValidationError: Stack Dummy-Stack-Name does not exist".

The meta-aws logs from the primary manager shows other manager/worker tried to get the token information. The corresponding logs from other machines, the wget call throws the following error "wget: can't connect to remote host (172.51..): Connection refused" and keeps retrying and fails after few retries. [For trial purposes - Security Group is open for all traffic from 0.0.0.0/0]

When i manually try to wget from a manager instance to primary manager as follows "wget -qO- http://MANAGER-IP:9024/token/manager/" I get the following error "wget: server returned error: HTTP/1.1 403 Forbidden"

Why is the connection refused for querying primary manager and what exactly the query is? I tried to query wget -qO- http://MANAGER-IP:9024/token/manager/ from other than the primary manager and I get the same error.

Is there really no need for any cloudformation related information or the docker4x/... containers and they for sure don't rely fully on CloudFormation?

beingamarnath avatar May 09 '18 10:05 beingamarnath

@xenoterracide: How did you fix it? I have exactly the same problem:

test1_grafana.1.yj4a8mwy7dzp@***    | mkdir: cannot create directory '/var/lib/grafana/plugins': Permission denied
test1_grafana.1.yj4a8mwy7dzp@***    | GF_PATHS_DATA='/var/lib/grafana' is not writable.
$ docker plugin install docker4x/cloudstor:18.03.1-ce-aws1 --alias cloudstor:aws \
--grant-all-permissions CLOUD_PLATFORM=AWS AWS_REGION=eu-central-1 \
DEBUG=1 EFS_SUPPORTED=0 AWS_STACK_ID=test 
$ docker version
Client:
 Version:      18.03.1-ce
 API version:  1.37
 Go version:   go1.9.6
 Git commit:   3dfb8343b139d6342acfd9975d7f1068b5b1c3d3
 Built:        Wed Jul 25 00:48:56 2018
 OS/Arch:      linux/amd64
 Experimental: false
 Orchestrator: swarm

Server:
 Engine:
  Version:      18.03.1-ce
  API version:  1.37 (minimum version 1.12)
  Go version:   go1.9.6
  Git commit:   7390fc6/18.03.1-ce
  Built:        Wed Jul 25 00:51:07 2018
  OS/Arch:      linux/amd64
  Experimental: true
$ cat docker-compose.yml
version: "3.6"

services:
  grafana:
    image: grafana/grafana
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  grafana_data:
    driver: cloudstor:aws
    driver_opts:
      backing: relocatable
      size: "10"
      ebstype: "gp2"

stefangraber avatar Jul 27 '18 15:07 stefangraber

good question, I don't know that I did, I gave up on this as a solution, I found it too fragile and slow, ultimately I found that using a udev rule to attach, format and mount was faster and more stable.

xenoterracide avatar Jul 27 '18 15:07 xenoterracide

I guess @beingamarnath does not need the answer anymore, but for others:

It seems "metaserver" (the Go-based, closed source http service running on all managers but seems to be only useful for the Leader one) checks the "swarm-node-type" tag on EC2 instances, and if it does not exist or not valid for the type of the node (either "manager" or "worker") returns a 403.

Adding this tag to the launch configuration immediately resolved that issue for me.

YektaLeblebici avatar Dec 04 '18 14:12 YektaLeblebici