=========================================== Automate deployment of Ceph clusters in AWS

:Author: Nathan Cutler :Code license: BSD 3 Clause :Documentation license: Creative Commons Attribution-ShareAlike (CC BY-SA)

.. contents:: :depth: 3

Acknowledgements

Several parts of this application - especially the command-line interface design and code - are derived from Loic Dachary's work in ceph-workbench_.

.. _ceph-workbench: http://ceph-workbench.readthedocs.org/en/latest/

Introduction

This document describes the ceph-auto-aws_ software for automating deployment of Ceph clusters in Amazon Web Services (AWS) - specifically the Elastic Computing Cloud (EC2) and Virtual Private Cloud (VPC) services.

.. _ceph-auto-aws: https://github.com/smithfarm/ceph-auto-aws

The software enables an arbitrary number of identical clusters from 1 to 251 to be so deployed.

So far, the software has been used in "hands-on" sessions, to provide each attendee with their own cluster to play with. It could also facilitate deployment of one-off clusters to test various Ceph configurations.

Scripting is provided for automating the provisioning of:

a VPC instance
subnets within the VPC
cluster instances (nodes) within each subnet
Salt Master instance (used to control the cluster instances)

The scripting is written in Python and relies on boto_ ("An integrated interface to current and future infrastructural services offered by Amazon Web Services") and SaltStack_ (a configuration management and distributed remote execution system).

Configuration and state are stored in YAML_ file. YAML_ is a human friendly data serialization standard for all programming languages.

.. _boto: http://boto.cloudhackers.com/en/latest/index.html .. _SaltStack: https://docs.saltstack.com/en/latest/topics/ .. _YAML: http://yaml.org/

Prerequisites and assumptions

We assume that you have access to Amazon Web Services (AWS) Elastic Computing Cloud (EC2) and Virtual Private Cloud (VPC). That means you can login via a web browser and access the EC2 and VPC dashboards.

We further assume that you have a relatively recent version of Python and virtualenv_ installed on your system. On openSUSE, Python should already be installed and installing virtualenv_ should be as simple as running the following command as root::

# zypper install python-virtualenv

If something in this software (or this document) doesn't work for you, open a bug report in the GitHub issue tracker_:

.. _GitHub issue tracker: https://github.com/smithfarm/ceph-auto-aws/issues

Early steps

Make an AWS user

If you are already logged in as an AWS IAM user, you can skip this section.

Set up an IAM user using the Creating an IAM User in Your AWS Account_ section of the AWS documentation.

We placed our user in the "ec2_full_access" group.

.. _Creating an IAM User in Your AWS Account: http://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html

Obtain access key

Access to AWS via boto requires an access key (Access Key ID and Secret Access Key).

First, check whether you were given an Access Key ID and Secret Access Key along with your AWS web console credentials.

If you have an IAM user, see the Managing Access Keys for IAM Users_ section of the AWS documentation. The access key comes in a file called "credentials.csv". Put this in a safe place.

However you got your AWS access key (Access Key ID and Secret Access Key), you will need to put them in ~/.boto as described in the Configuring boto credentials section of the boto documentation_.

.. _Managing Access Keys for IAM Users: http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_access-keys.html .. _Configuring boto credentials section of the boto documentation: http://boto.readthedocs.org/en/latest/getting_started.html#configuring-boto-credentials

Sample ~/.boto file::

[Credentials]
aws_access_key_id = [gobbledygook]
aws_secret_access_key = [even_longer_gobbledygook]

Clone repo

Clone this repo to your local machine::

$ git clone https://github.com/smithfarm/ceph-auto-aws

All of the following instructions assume you are in the directory containing the local clone.

Installation

This software is designed to be installed in the standalone virtual Python environment, implemented with virtualenv_.

Installation is a two-step process. First, run the bootstrap script::

$ ./bootstrap

This installs the virtual environment in the virtualenv/ directory. The second step is to activate the virtualenv_. The shell prompt changes to indicate that the virtual environment is active::

$ source virtualenv/bin/activate
(virtualenv)$

Use the deactivate command to leave::

(virtualenv)$ deactivate
$

.. _virtualenv: https://virtualenv.pypa.io/en/latest/

Get familiar with ho

All scripting features are implemented as subcommands of a single script: ho (an abbreviation of "hands-on")::

(virtualenv)$ ho --help

Test AWS connectivity

Run the following command to test whether you have your AWS credentials in order::

(virtualenv)$ ho probe aws
2016-03-27 20:30:16,554 INFO Connected to AWS EC2

Configuration

YAML file

Interaction with AWS is controlled by a configuration file called aws.yaml. By default, this file is searched for in the current directory. If it is not found, a new one will be created.

We assume that you are starting from scratch. To get started, run the following command::

(virtualenv)$ ho probe yaml
2016-03-30 21:35:12,105 INFO Probing 'subnets' stanza
2016-03-30 21:35:12,105 INFO Loaded yaml tree from './aws.yaml'
2016-03-30 21:35:12,106 INFO Probing 'keyname' stanza
2016-03-30 21:35:12,106 INFO Probing 'vpc' stanza
2016-03-30 21:35:12,108 INFO Probing 'role-definitions' stanza
2016-03-30 21:35:12,111 INFO Detected roles ['admin', 'windows', 'master', 'mon', 'defaults', 'osd']
2016-03-30 21:35:12,111 INFO Probing 'region' stanza
2016-03-30 21:35:12,113 INFO Probing 'cluster-definition' stanza
2016-03-30 21:35:12,115 INFO Detected cluster-definition stanza
2016-03-30 21:35:12,115 INFO Detected role 'admin' in cluster definition
2016-03-30 21:35:12,115 INFO Probing 'delegates' stanza
2016-03-30 21:35:12,117 INFO Probing 'types' stanza
2016-03-30 21:35:12,117 INFO YAML tree is sane

You can see that the YAML file has been created::

(virtualenv)$ file aws.yaml
aws.yaml: ASCII text

You can run ho probe yaml anytime to check your configuration file, and especially after any manual modifications.

Region and Availability Zone

The next step is to configure the AWS Region. The default is eu-west-1, i.e. "EU (Ireland)". If you want to use a different region, edit the YAML file (aws.yaml in current directory) and edit the following line::

region:
  availability_zone:
  region_str: eu-west-1

If you don't care about the availability zone, just leave it unset. AWS will assign one.

If you want to set an availability zone, you must do so before subnets are created, since subnets exist within an availability zone. Once subnets are created the availability zone cannot be changed (or, more accurately, it can be changed but ho install delegates will then fail because of the availability zone mismatch).

Next, verify that you can connect to that region by running the command::

(virtualenv)$ ho probe region
2016-10-18 13:51:58,156 INFO Loaded yaml tree from './aws.yaml'
2016-10-18 13:51:58,156 INFO Testing connectivity to AWS Region {'region_str': 'us-east-1', 'availability_zone': None}
2016-10-18 13:51:58,404 INFO Detected 5 VPCs
2016-10-18 13:51:58,404 INFO Availability zone not set in YAML

Virtual Private Cloud

To ensure that our demo clusters do not interfere with other AWS projects, we use a Virtual Private Cloud (VPC) containing a number of subnets.

All the delegates will share a single VPC 10.0.0.0/16. Within that VPC there will be a /24 subnet for each delegate, plus one for the Salt Master.

The Salt Master resides in its own subnet: 10.0.0.0/24.

Each delegate will be assigned a number, e.g. 12. The subnet of delegate 12 will be 10.0.12.0/24.

VPC configuration

If you are setting up a VPC for the first time, run the following command to create one::

(virtualenv)$ ho install vpc
2016-03-30 23:20:34,407 INFO Loaded yaml tree from './aws.yaml'
2016-03-30 23:20:34,686 INFO New VPC ID vpc-cfd7c9aa created with CIDR block 10.0.0.0/16
2016-03-30 23:20:34,816 INFO Object VPC:vpc-cfd7c9aa tagged with Name=handson

Once the VPC has been created, the vpc stanza will look like this::

vpc:
  cidr_block: 10.0.0.0/16
  id: cfd7c9aa

Note that ho install vpc is idempotent: you can run it as many times as you want. Try running it a second time::

(virtualenv)$ ho install vpc
2016-03-30 23:22:00,612 INFO Loaded yaml tree from './aws.yaml'
2016-03-30 23:22:00,613 INFO VPC ID according to yaml is vpc-cfd7c9aa
2016-03-30 23:22:00,907 INFO VPC ID is vpc-cfd7c9aa, CIDR block is 10.0.0.0/16

Any other output (and especially any traceback) probably means your VPC is not set up properly.

Internet Gateway

Initially, the VPC will not have an Internet Gateway, and so it will not be able to communicate with the outside world in any way (regardless of Security Group settings in any instances running inside the VPC). This includes SSH access into the VPC from outside.

The fact that VPCs are by default completely isolated from the outside world is by design, but it is not appropriate for a hands-on demonstration.

To remedy this, first create an Internet Gateway and attach it to the VPC.

The steps to create the internet gateway are explained in detail at the aws official docs. You can create an internet gateway from https://console.aws.amazon.com/vpc/ and add it to the the vpc (handson by default) created from the previous steps.

WARNING: The scripting does not do this step for you!

.. _aws official docs: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Internet_Gateway.html#Add_IGW_Attach_Gateway

Route Table

Even with the Internet Gateway in place, no packets originating from the VPC will be routed to the outside until a default route is added. This is because the default Route Table looks like this:

=========== ======= ======= =========== Destination Target Status Propagated =========== ======= ======= =========== 10.0.0.0/16 local Active No =========== ======= ======= ===========

Add a "default route" line to this table, so it looks like this:

=========== ======= ======= =========== Destination Target Status Propagated =========== ======= ======= =========== 10.0.0.0/16 local Active No 0.0.0.0/0 igw-... Active No =========== ======= ======= ===========

WARNING: The scripting does not do this step for you!

Network ACL

Network ACLs are like firewalls at the subnet level. For more information, see the Network ACLs chapter of the AWS documentation_.

.. _Network ACLs chapter of the AWS documentation: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_ACLs.html

Even with the Internet Gateway and the Route Table set up, networking may still not work as expected inside the VPC. If this is the case, check if there is a Network ACL associated with your VPC, and check the settings::

"Security" -> "Network ACLs" in VPC Dashboard

A working (wide open) Network ACL table might look like this ("Inbound Rules" and "Outbound Rules"):

====== =========== =========== =========== =========== ============ Rule # Type Protocol Port Range Destination Allow / Deny ====== =========== =========== =========== =========== ============ 100 ALL Traffic ALL ALL 0.0.0.0/0 ALLOW

 ALL Traffic ALL         ALL         0.0.0.0/0   DENY

====== =========== =========== =========== =========== ============

Make sure you are looking at the Network ACL that is associated with your VPC.

WARNING: The scripting does not do this step for you!

Security Groups

Security Groups are like firewalls at the instance (individual VM) level. For more information, see the Security Groups for Your VPC chapter of the AWS documentation.

.. _Security Groups for Your VPC: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_SecurityGroups.html

Even with the Internet Gateway and the Route Table set up, and Network ACL wide open (or disabled), you will still not be able to ping your AWS nodes unless you edit the Inbound Rules table of your VPC's default Security Group.

You will find it under::

"Security" -> "Security Groups" in VPC Dashboard

By default, the Inbound Rules table will look like this:

=========== ======== ========== ====== Type Protocol Port Range Source =========== ======== ========== ====== ALL Traffic ALL ALL sg-... =========== ======== ========== ======

Note that only packets originating from within the same Security Group are accepted. All others are dropped.

Edit the line so Source is set to 0.0.0.0/0:

=========== ======== ========== =========== Type Protocol Port Range Source =========== ======== ========== =========== ALL Traffic ALL ALL 0.0.0.0/0 =========== ======== ========== ===========

Such a setup means the machines in your VPC will be exposed to scanning, and if they have any unpatched vulnerabilities evil people might take control of them.

To address this, replace the 0.0.0.0/0 line in the Inbound Rules table with lines covering all the public network segments from which people will be accessing your VPC.

WARNING: The scripting does not do this step for you!

Subnets

As explained in the introduction to the Virtual Private Cloud_ chapter, each delegate will have their own "Class C" /24 virtual network, or "subnet".

Subnet configuration

Initially, the subnets stanza of your aws.yaml file should be empty::

subnets: {}

Do not add anything here: the scripting will create subnets automatically based on the number of delegates given in the delegates stanza, e.g.::

delegates: 1

If you want more than one cluster, change the delegates stanza in the YAML file now.

Create subnets

To ensure that the subnets are created for each delegate plus the Salt Master, you should run::

(virtualenv)$ ho install subnets --all --master
2016-04-03 07:59:03,992 INFO Loaded yaml tree from './aws.yaml'
2016-04-03 07:59:03,992 INFO Delegate list is [0, 1]
2016-04-03 07:59:03,992 INFO Installing subnet for delegate 0
...

This will create a 10.0.0.0/24 subnet for the Salt Master and one additional /24 for each delegate (one in the default case). It will also add the appropriate tags to the subnet objects.

Like ho install vpc, this command is idempotent.

Subnet caveat

AWS reserves both the first four IP addresses and the last IP address in each subnet's CIDR block. For example, in the 10.0.0.0/24 subnet, these IP addresses are not available for use:

10.0.0.0: Network address.
10.0.0.1: Reserved by AWS for the VPC router.
10.0.0.2: Reserved by AWS for mapping to the Amazon-provided DNS.
10.0.0.3: Reserved by AWS for future use.
10.0.0.255: Network broadcast address. We do not support broadcast in a VPC, therefore we reserve this address.

For this reason, instances must not be assigned last_octet values 0, 1, 2, 3, or 255.

Role and cluster definition

Once the subnets are set up, the next step is to define the cluster each delegate will receive.

This software assumes that each delegate will have one cluster and all the clusters will be identical.

Each cluster consists of some number of instances, and each instance has a "role" that it plays in the cluster.

NOTE: As far as this software is concerned, the term "role" is interchangeable with "node", "instance" or "virtual machine"!

Before you can install a cluster (or twelve!), you must first edit the cluster definition_ and role definitions_ in the yaml.

Role definitions

Roles are defined in the role-definitions stanza of the YAML. This stanza is a mapping, the keys of which are the names of the respective roles.

There are two special roles: defaults and master. The former defines the set of permissible role attributes and their default values. The latter defines the attributes of the Salt Master node.

Each role definition may contain one or more of the following attributes:

========================= ==================================================== Role definition attribute Description ========================= ==================================================== ami-id AMI ID of image from which to create the instance last-octet value of last octet of instance IP address (10.0.0.x) node-no arbitrary number that can optionally be associated with a node replace-from-environment FIXME type the Instance Type user-data file containing user-data volume disk volume to be attached to the instance (optional) ========================= ====================================================

If you are setting up a hands-on, now would be a good time to define your roles. The following sections should help.

ami-id (REQUIRED) ^^^^^^^^^^^^^^^^^

The ami-id is the ID of the Amazon Machine Image (AMI)_ to use when provisioning the node. Basically, it should be a recent Linux image that you are capable of installing Ceph on.

.. _Amazon Machine Image (AMI): http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html

last-octet (REQUIRED) ^^^^^^^^^^^^^^^^^^^^^

This attribute should be an integer value between 4 and 254 (inclusive) - see Subnet caveat_. Together with the delegate number, it determines the IP address of the node. For example, if the delegate number is 3 and last-octet is 8, the IP address will be 10.0.3.8/24.

node-no (OPTIONAL) ^^^^^^^^^^^^^^^^^^

This is an entirely optional value that can be associated with a node. This number determines what @@NODE_NO@@ in the user-data will be replaced with.

replace-from-environment (OPTIONAL) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

FIXME

type (REQUIRED) ^^^^^^^^^^^^^^^

This determines the Instance Type_ of the node. If all the nodes will have the same Instance Type, you can just set it once in the defaults section. It does not need to be set individually for each role.

.. _Instance Type: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html

The instance types are described at https://aws.amazon.com/ec2/instance-types/

I am using t2.small for cluster nodes and t2.micro for the Salt Master. Both are single CPU, t2.small has 2 GB of memory and t2.micro has 1 GB.

There are two "types" of instance types: "ebs" and "paravirtual". All the t2.xxx types are EBS-only. EBS stands for "Elastic Block Store". This is important to know if you make a snapshot and want to create an AMI from that snapshot. (Also, I think any volumes you create must be EBS if you want to use them with t2.xxx instances.)

user-data (OPTIONAL) ^^^^^^^^^^^^^^^^^^^^

After the image boots for the first time, we need to run a custom setup script. In Cloud terminology this is known as "user-data". Often the user-data takes form of "cloud-init" YAML. However, with AWS it can be an ordinary shell script.

For testing, you can type or cut-and-paste user-data in the web console, into the box located at the very bottom of the "3. Configure Instance" dialog, hidden under "Advanced Details".

Once you have developed just the right user-data for your application, put it in a file, and set the user-data YAML attribute to the absolute or relative path to this file. Whatever it is, the user-data in that file will be run in the instance when it first launches. See Running Commands on Your Linux Instance at Launch_.

.. _Running Commands on Your Linux Instance at Launch: http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/user-data.html

This value is optional in the sense that ho will instantiate nodes without it, but you will probably need it if you want to automate the process of installing and starting the Salt Minion service on the nodes.

volume (OPTIONAL) ^^^^^^^^^^^^^^^^^

Each node has a root volume, the size of which is defined by the Instance Type (VERIFY). This is sufficient for admin nodes and monitor-only nodes. If you want to run an OSD on a node, though, a separate volume will be necessary. Typically this will be an Amazon Elastic Block Store (EBS)_ volume.

.. _Amazon Elastic Block Store (EBS): https://aws.amazon.com/ebs/

The volume attribute takes an integer value which is interpreted as the volume size in Gigabytes.

If the attribute is missing, or has no value, or has a zero value, no separate volume is created.

Cluster definition

Once you have defined the roles, the next step is to stipulate the set of roles that will constitute a cluster. Remember, each delegate will get one cluster (one set of roles).

The cluster is defined in the cluster-definition stanza of the yaml. This stanza consists of a "collection" (list, array) of instance definitions. Each instance definition must contain a role attribute defining the instance role, which should be a very short string (e.g., "mon1") describing the role this instance will play in the cluster.

The value of each role attribute must match one of roles defined in the role-definitions YAML stanza (see Role definitions_).

For example, a reasonable demo cluster might consist of three MON/OSD nodes (roles mon1, mon2, and mon3, respectively) and an "admin node" with a public IP address::

cluster-definition:
  - role: admin
  - role: mon1
  - role: mon2
  - role: mon3

Provided the roles are properly defined in the role-definitions stanza, this is a legal cluster definition.

Validation of role and cluster definitions

Before you actually try to spin up a cluster, it's a good idea to validate your YAML::

(virtualenv)$ ho probe yaml

This command loads the YAML file and performs various validations checks, including basic sanity checks on the cluster-definition and role-definitions stanzas.

Keypairs

Before you spin up any Delegate Clusters, you will need to generate delegate (SSH) keypairs and import them to AWS.

Keyname

The keyname stanza in the YAML file determines how the keypairs will be named. If you do nothing, it will be set to your username. If your username is "regnaw", the Salt Master's keypair will be named regnaw-d0, Delegate 1's keypair will be regnaw-d1, etc.

If you want the keypair names to be based on some other string, just set the keyname attribute in the YAML file before continuing.

Generate delegate keypairs

Each delegate will have its own keypair. To generate keypairs for all the delegates, do::

$ ./generate-keys.sh

Then, to import them into AWS, do::

$ ho install keypairs --all --master

Delegates

When newly instantiated nodes boot up for the first time, a script called user-data is run as root. The idea is for this script to bring the nodes into a "SaltStack-ready" state - i.e. Salt Master service running on the Salt Master node, Salt Minion services running on the Delegate Cluster nodes, and minions communicating with, and accepting orders from, the Salt Master. SSH access should also be possible using the respective delegate keypair.

To get Ceph running on the cluster nodes, additional steps are necessary. These steps are accomplished by running SaltStack_ commands on the Salt Master node.

At this point, you should have completed the following steps:

ho probe aws
ho probe yaml
ho probe region
ho install vpc
create Internet Gateway in VPC Console
ho install subnets --all --master
define roles (by editing the YAML file)
define cluster (by editing the YAML file)
./generate-keys.sh
ho install keypairs --all --master
write user-data script for the Salt Master
set user-data attribute of master role to filename of Salt Master user-data script
write user-data scripts for all your roles
set user-data attribute of all roles to the appropriate filename

Now you are ready to instantiate nodes. We start with the Salt Master node.

Install Salt Master

Delegate 0 is the Salt Master, but we do not write, e.g., ho install delegates 0. Instead, we pass the --master option like so::

$ ho install delegates --master

.. Theoretically, it is possible to instantiate the Salt Master node and all .. the Delegate Cluster nodes at once by doing:: .. .. $ ho install delegates --all --master .. .. In practice, this will not work. The nodes will be instantiated and the .. user-data scripts will run. However, tis not recommended, however, because it's a good idea to let the Salt .. Master node "settle" and verify its proper functioning before instantiating any .. Delegate Cluster nodes, since these nodes will typically have user-data .. scripts that automate registration of minion keys with the Salt Master. .. It is a good idea to wait until the Salt Master boots up for the first time and finishes running its user-data script before installing any Delegate Clusters.

.. Once the SSH service is running, you can SSH into the Salt Master. Then you can .. tail the logs in FIXME like so:: .. .. $ FIXME FIXME FIXME TAIL THE USER-DATA LOGS

Install Delegate Clusters

This software is capable of automating the installation of multiple Delegate Clusters - up to the number set in the delegates stanza of the YAML file.

If you are just testing the software, it's probably a good idea not to set delegates too high. You could set a value of 1 to start with::

cluster-definition:
  - role: admin

delegates: 1

...

The delegates stanza limits the number of clusters that can be instantiated at once (or at all). A value of 1 means that the ho install delegates command will only take an argument of 1. Any other argument will fail. If you specify --all, it will mean 1.

With the above YAML a single Delegate Cluster will be installed when you run::

$ ho install delegates 1

The cluster will consist of a single admin node which will be instantiated in the 10.0.1.0/24 subnet.

Instance tagging

Automatically, each cluster instance will be tagged as follows:

======== =========================================== Tag Description ======== =========================================== Name the value of the nametag yaml attribute Delegate the delegate number Role the instance role ======== ===========================================

Stop and start clusters

You can stop and start clusters using the ho stop delegates and ho start delegates commands, respectively. "Stop" in this context triggers an orderly shutdown, so it involves a transition to "powered-off" state. "Start", then, is conceptually similar to powering up.

For example::

$ ho stop delegates 1
$ ho stop delegates 1,3,5-7
$ ho stop delegates --all
$ ho stop delegates --all --master

$ ho start delegates 1
$ ho start delegates 1,3,5-7
$ ho start delegates --all
$ ho start delegates --all --master

The --master option adds delegate 0 (the Salt Master) to the list of delegates to which the operation (start or stop) is applied.

Wipeout clusters

When you are finished with a cluster (or clusters), you can delete it/them by::

$ ho wipeout delegates [DELEGATE_LIST]

where [DELEGATE_LIST] is something like 1-12 for Delegate Clusters one through twelve, 5 for Delegate Cluster five, or 1,3,7-9 for Delegate Clusers one, three, seven, eight, and nine.

Sticking to our minimal example from Install Delegate Clusters_, we could wipe out that cluster by::

$ ho wipeout delegates 1

When you are finished with the Salt Master, you can delete it by adding the --master option, e.g.::

$ ho wipeout delegates --master

You can wipe out all instances, i.e all Delegate Clusters and the Salt Master, like so::

$ ho wipeout delegates --all --master

NOTE: The wipeout commands discussed in this section remove cluster nodes and EBS volumes only. They do not have any effect on subnets or the VPC. (If needed, those must be wiped out separately.)

Spin up a Delegate Cluster

Take the following example::

cluster-definition:
  - role: admin
  - role: mon1
  - role: mon2
  - role: mon3
  - role: windows

...

role-definitions:
  admin:
    last-octet: 10
    volume:
  defaults:
    ami-id: ami-ff63dd8c
    last-octet:
    replace-from-environment: []
    type: t2.small
    user-data: data/user-data-minions
    volume: 20
  master:
    last-octet: 10
    user-data: data/user-data-master
    volume:
  mon1:
    last-octet: 11
    volume: 20
  mon2:
    last-octet: 12
    volume: 20
  mon3:
    last-octet: 13
    volume: 20
  osd:
    last-octet: 14
    volume: 20
  windows:
    ami-id: ami-c6972fb5
    last-octet: 15
    user-data: data/user-data-windows
    volume:

The user-data-minions script updates each cluster node and adds the repo containing the latest versions of the ceph and ceph-deploy packages.
It also configures and enables the ntp and salt-minion services.

One can follow progress of the user-data script on a given node by sshing into the node and doing::

(Cluster Node)# tail -n 100 -f /var/log/cloud-init-output.log

Once all the cluster nodes have finished running their user-data scripts, you can SSH to the Salt Master and list the minion keys::

(Salt Master)# salt-key -L

This shows the unaccepted keys. Accept them by doing::

(Salt Master)# salt-key -A -y

If there are stale keys from clusters that have been wiped out, you can just delete all keys and wait for the live minions to re-connect::

(Salt Master)# salt-key -A -y

The next step is to run the ceph-admin Salt State on all the nodes. In this example we are spinning up a cluster for Delegate 2::

(Salt Master)# salt -C "G@delegate:2" state.sls ceph-admin

Examine all the output. If there are failures, just run the command over again. Once it is completing without any failures, remotely run the ceph-deploy-sh Salt State on the admin node to deploy a Ceph cluster::

(Salt Master)# salt -C "G@delegate:2 and G@role:admin" state.sls ceph-deploy-sh

This will take a minute or two to complete. If all goes well, it will succeed. If it fails, you have no choice but to wipe out the delegate and start over.

Of course, the gold standard of a well-functioning Ceph cluster is HEALTH_OK. Check the cluster health by running the ceph-s Salt State::

(Salt Master)# salt -C "G@delegate:2 and G@role:admin" state.sls ceph-s

If you want to fill the cluster partially up with some data, do::

(Salt Master)# salt -C "G@delegate:2 and G@role:mon1" state.sls owen-data-sh

At this point, you can SSH into the Delegate 2 admin node and become user "ceph" by doing::

(Delegate 2 admin node)# su - ceph

Lessons Learned from Snow Unix 2016

The following lessons were learned:

double-check instance limit
practice spinning up the full number of delegates (not just once, but several times in a row)
figure out how best to freeze the state so we no longer run "zypper up", exposing ourselves to the risk of a new kernel, etc. coming out

Notes for developers

This software is designed to be run from a virtualenv (created by running the bootstrap script) within a local clone of this git repository.

If you make changes to the code, these will not be automatically reflected in the virtualenv. To make that happen, run the following command in the top-level directory::

python setup.py development

If the version number is incremented using the release.sh script, the code in the virtualenv can be upgraded by running this command in the top-level directory::

easy_install -U .

The version number has three components, X.Y.Z or major.minor.patch. For example, if the version number is 2.3.1 the major version is 2, the minor version is 3, and the patch level is 1. The version number can be incremented by running the release.sh script with an argument indicating which component should be incremented::

./release.sh major|minor|patch

So, to "bump" the version number from 2.3.1 to 2.3.2, you would do::

./release.sh patch
easy_install -U .

Note that the ChangeLog file is updated automatically from the git commit descriptions. You should not attempt to edit the ChangeLog file manually.

Deploying with DeepSea

It is now possible, and expected, to deploy Delegate Clusters using DeepSea.

Caveat

Because the process of deploying DeepSea requires a local Salt Master within the Delegate Cluster, clusters lose their connection with the root Salt Master after deployment. This is unavoidable until someone comes up with a way to run two salt-minion.service instances in a single VM.

How to proceed

In the role definition, specify susecon2017/user-data-root-master for the master node's user-data and susecon2017/user-data-minion for all the minion nodes. When the master and minion (delegate) VMs come up, all the delegate VMs will be configured as Salt Minions pointing to the root Salt Master.

After running ho install delegates --all --master to create the VMs, ssh to the root master VM, become root, and change the current working directory to /srv/salt::

$ ssh -i keys/smithfarm-d0 [email protected]
Last login: Wed Sep 13 19:42:59 2017 from 193.165.237.27
This is the Salt Master.

Have a lot of fun...
ec2-user@ip-10-0-0-10:~> sudo -s
ip-10-0-0-10:/home/ec2-user # cd /srv/salt

The /srv/salt directory contains the contents of https://github.com/smithfarm/susecon-salt-master.git (master branch). This is a set of Salt state files to facilitate deployment of local Salt clusters in each Delegate Cluster and then using DeepSea to install Ceph in the Delegate Cluster. Before anything else, apply the bootstrap state on all minions::

# salt '*' state.apply bootstrap

The bootstrap state is quite busy, but from the user's perspective it creates a cephadm user on all the delegate nodes, with the possibility to ssh as cephadm to any node from the root master. For example, assuming Delegate 3's "admin" (local Salt Master) node is ip-10-0-3-10, we can ssh to it like so::

ip-10-0-0-10:/home/ec2-user # ssh cephadm@ip-10-0-3-10
Last login: Wed Sep 13 20:12:11 2017 from 10.0.0.10

This is the admin node.

cephadm@ip-10-0-3-10:~>

After applying the bootstrap state, we continue by applying the deepsea-salt-master state to all nodes with the "role:admin" grain (this is assuming the Delegate "admin" role will be used for the local Salt Master)::

# salt -G 'role:admin' state.apply deepsea-salt-master

This clones the DeepSea git repo into /home/cephadm/DeepSea, installs DeepSea and its dependencies. In the final step, we will run one of the scripts in /home/cephadm/DeepSea/qa to actually deploy Ceph, but let's not get ahead of ourselves. Next, we apply the deepsea-salt-minion state to point the Delegate Minions to their new master. Since the local master node is also a minion, we can simply apply it to all nodes, or to all nodes::

# salt -G 'role:admin' state.apply deepsea-salt-minion

Or to all nodes belonging to a certain Delegate::

# salt -G 'delegate:3' state.apply deepsea-salt-minion

After this step, we can no longer ping or otherwise control these nodes, so their keys should be deleted. For example, to delete all minion keys belonging to Delegate 3::

# salt-key -d ip-10-0-3-*

Run DeepSea to deploy Ceph

The final step is to run DeepSea on each Delegate's local master node ("admin node"). Since we have lost the root master's connection to the Delegate Minions, we have no choice but to ssh to each local master in turn, accept the minion keys, and run the script. The deepsea-salt-master state installs a /home/cephadm/bin/health-ok shell script to make this easier::

ip-10-0-0-10:/home/ec2-user # ssh cephadm@ip-10-0-3-10
Last login: Wed Sep 13 20:12:11 2017 from 10.0.0.10

This is the admin node.

cephadm@ip-10-0-3-10:~> bin/health-ok

Other miscellaneous notes

Package Updates

Once a SLES image boots up, the first thing you need to do is "zypper up". Once nice feature of AWS is that it has its own internal SMT server. However, it takes some seconds after boot for the the associated zypper service to appear. Therefore, we use the following loop in the user-data script::

while sleep 10 ; do
    zypper services | grep 'SMT-http_smt-ec2_susecloud_net'
    if [[ $? = 0 ]] ; then
        break
    fi
done

After that completes, you can assume that the basic repos are available, so you can do "zypper up" as follows::

while sleep 5 ; do
    zypper -n update
    if [[ $? = 0 ]] ; then
        break
    fi
done

SUSE Enterprise Storage repos

Unfortunately, the AWS SMT server only has the basic SLES pool and update repos. No SUSE Enterprise Storage or any other add-ons for that matter. So we have to make our own installation sources. The way I ended up doing that was to loop mount the SES2 GA ISO on the Salt Master and run an apache2 server there to farm it out to the delegate instances.

First, append the ISO to /etc/fstab::

$MEDIA_FULL_PATH /srv/repos/SES2-media1 iso9660 loop 0 0

Second, mount the ISO::

mount /srv/repos/SES2-media1

Third, set up Apache::

# zypper in apache2
# systemctl enable apache2.service
# echo "I am a puppet" > /srv/repos/puppet.txt
# vim /etc/apache2/vhosts.d/admin.conf

<VirtualHost *:80>
    ServerAdmin [email protected]
    ServerName admin
    DocumentRoot /srv/repos
    HostnameLookups Off
    UseCanonicalName Off
    ServerSignature On
    <Directory /srv/repos>
        Options Indexes FollowSymLinks
        AllowOverride All
        Require all granted
    </Directory>
</VirtualHost>

# systemctl restart apache2.service
# curl http://localhost/puppet.txt
I am a puppet

Fourth, try the curl command from another machine in the cluster.

Fifth, add the repo on the cluster nodes::

# zypper ar http://localhost/SES2/ SES2
Adding repository 'SES2' ......................................................[done]
Repository 'SES2' successfully added
Enabled     : Yes                  
Autorefresh : No                   
GPG Check   : Yes                  
URI         : http://localhost/SES2

Sixth, install Ceph packages from the ISO on the cluster nodes (use SaltStack for this).

Logging user-data script output

Source: https://alestic.com/2010/12/ec2-user-data-output/

As the user-data script runs, its output is logged to a file called::

/var/log/cloud-init-output.log

Adding tags to instances after run_instances

http://stackoverflow.com/questions/8070186/boto-ec2-create-an-instance-with-tags

SaltStack notes

Ping all machines belonging to a given delegate::

salt -G 'delegate:12' test.ping

Get IP addresses of all machines belonging to the delegate::

salt -G 'delegate:12' network.ip_addrs

Compound match: get IP address of Delegate 12's admin node::

salt -C 'G@delegate:1 and G@role:admin' network.ip_addrs

Windows change administrator password via user-data script

ceph-auto-aws ceph-auto-aws copied to clipboard

Metadata

=========================================== Automate deployment of Ceph clusters in AWS

Acknowledgements

Introduction

Prerequisites and assumptions

Early steps

Make an AWS user

Obtain access key

Clone repo

Installation

Get familiar with ho

Test AWS connectivity

Configuration

YAML file

Region and Availability Zone

Virtual Private Cloud

VPC configuration

Internet Gateway

Route Table

Network ACL

Security Groups

Subnets

Subnet configuration

Create subnets

Subnet caveat

Role and cluster definition

Role definitions

Cluster definition

Validation of role and cluster definitions

Keypairs

Keyname

Generate delegate keypairs

Delegates

Install Salt Master

Install Delegate Clusters

Instance tagging

Stop and start clusters

Wipeout clusters

Spin up a Delegate Cluster

Lessons Learned from Snow Unix 2016

Notes for developers

Deploying with DeepSea

Caveat

How to proceed

Run DeepSea to deploy Ceph

Other miscellaneous notes

Package Updates

SUSE Enterprise Storage repos

Logging user-data script output

Adding tags to instances after run_instances

SaltStack notes

Windows change administrator password via user-data script

← Metadata

Owner

Metadata

ceph-auto-aws
ceph-auto-aws copied to clipboard