runjop icon indicating copy to clipboard operation
runjop copied to clipboard

RunJOP (Run Just Once Please) is a distributed execution framework to run a command (i.e. a job) only once in a group of servers.

Run Just Once Please: runjop

RunJOP (Run Just Once Please) is a distributed execution framework to run a command (i.e. a job) only once in a group of servers.

Some possible use cases are:

  • It can be used together with UNIX/Linux cron to put a crontab schedule in High Availability (HA).

  • To execute a batch in an Auto Scaling group by only one of the EC2 instances.

  • To execute a command after an SNS notification is received in HA on multiple nodes, using the MessageId as the id of the job to make sure is executed only once.

Some features and internals:

  • The idea is to use Amazon DynamoDB to make sure only one server "reserves" the right to execute the command for a certain range of time.

  • Amazon S3 can optionally be used to consolidate the logs of the jobs in a single repository.

  • AWS credentials can be passed using AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environmental variables.

  • In an EC2 instance a IAM role can be used to give access to DynamoDB/S3 resources.

This is a personal project. No relation whatsoever exists between this project and my employer.

License

Copyright (c) 2013 Danilo Poccia, http://blog.danilopoccia.net

This code is licensed under the The MIT License (MIT). Please see the LICENSE file that accompanies this project for the terms of use.

Introduction

Running this two commands concurrently on two hosts one of the node will execute the command, the other will not. In this example the command is executed on the "second" node. Debugging info is added to give more information on the execution.

In this example, on the "first" node the "Hello World" command is not executed:

runjop.py --region=eu-west-1 --table myschedule --id my-job --range=10 --node first --s3=s3://BUCKET/mylogs "echo Hello World" --log /tmp/runjop.log -d
DEBUG:runjop:__init__ '{'node': 'first', 's3log': 's3://BUCKET/mylogs', 'region': 'eu-west-1', 'range': '10', 'debug': True, 'table': 'runjop', 'logfile': '/tmp/runjop.log', 'id': 'my-job'}'
INFO:runjop:S3 bucket: 'BUCKET'
INFO:runjop:S3 prefix: 'mylogs/'
DEBUG:runjop:table 'runjop' not found
INFO:runjop:table 'runjop' created
DEBUG:runjop:waiting for table 'runjop' to be active
DEBUG:runjop:table 'runjop' is active
DEBUG:runjop:run '['echo Hello World']'
DEBUG:runjop:now = '2013-02-11 16:03:46'
DEBUG:runjop:last_item = '{u'node': u'second', u'counter': 1, u'job_id': u'my-ls', u'time': u'2013-02-11 16:03:46'}'
DEBUG:runjop:last_time_str = '2013-02-11 16:03:46'
DEBUG:runjop:counter = '1'
DEBUG:runjop:outside of range of 10 seconds: False
INFO:runjop:not outside of range of execution
INFO:runjop:command not executed

On the "second" node, the "Hello World" command is executed:

runjop.py --region=eu-west-1 --table myschedule --id my-job --range=10 --node second --s3=s3://BUCKET/mylogs "echo Hello World" --log /tmp/runjop.log -d
DEBUG:runjop:__init__ '{'node': 'second', 's3log': 's3://BUCKET/mylogs', 'region': 'eu-west-1', 'range': '10', 'debug': True, 'table': 'runjop', 'logfile': '/tmp/runjop.log', 'id': 'my-job'}'
INFO:runjop:S3 bucket: 'BUCKET'
INFO:runjop:S3 prefix: 'mylogs/'
DEBUG:runjop:table 'runjop' found
DEBUG:runjop:waiting for table 'runjop' to be active
DEBUG:runjop:table 'runjop' is active
DEBUG:runjop:run '['echo Hello World']'
DEBUG:runjop:now = '2013-02-11 16:03:46'
DEBUG:runjop:outside of range of 10 seconds: True
DEBUG:runjop:put result '{u'ConsumedCapacityUnits': 1.0}'
DEBUG:runjop:execute_job 'True'
INFO:runjop:executing command 'echo Hello World'
INFO:runjop:returncode = 0
INFO:runjop:output:
Hello World

INFO:runjop:output written on s3://danilop-fs/logs/runjop-my-ls-20130211-160346-second-0.log

On DynamoDB the "myschedule" table can be used as an activity log:

job_id    counter  node      time 
"my-job"  1        "second"  "2013-02-11 16:03:46"
"my-job"  2        "first"   "2013-02-11 16:08:52"

The optional S3 log has the following naming convention:

{table}-{id}-{YYYYMMDD}-{hhmmss}-{node}-{returncode}.log

Using with cron

The previous example can be scheduled using cron on more than one hosts, but only one will actually run it.

In this example two options are removed from the invocation of the tool (compared to the previous one):

  • without the "--node" option the hostname of each node is used
  • without the "--range" option the default 300 seconds (5 minutes) range is used.

E.g. to execute the job one minute past midnight (00:01) of every day of the month, of every day of the week:

1 0 * * *  /somepath/runjop.py --region=eu-west-1 --table myschedule --id my-job --range=10 --s3=s3://BUCKET/mylogs "echo Hello World" --log /var/log/runjop.log

E.g. to execute the job to be run every two hours, namely at midnight, 2am, 4am, 6am, 8am, and so on:

0 */2 * * *  /home/username/runjop.py --region=eu-west-1 --table myschedule --id my-job --range=10 --s3=s3://BUCKET/mylogs "echo Hello World" --log /var/log/runjop.log

Full Usage

Usage: runjop.py [options] "<command(s)>"

RunJOP (Run Just Once Please)

A distributed execution framework to run a command (i.e. a job) only once in a group of servers.
This can be used together with UNIX/Linux cron to put a crontab schedule in High Availability (HA).
The idea is to use Amazon DynamoDB to make sure only one server "reserves" the right
to execute the command for a certain range of time.
Amazon S3 can optionally be used to consolidate the logs of the jobs in a single repository.

Options:
  -h, --help       show this help message and exit
  --region=REGION  AWS region to use for DynamoDB (default is us-east-1)
  --table=TABLE    the DynamoDB table use to check concurrency and log job
	       executions (a new table is created if not found)
  --id=ID          the unique ID identifying this job across multiple servers
  --node=NODE      an identifier for the node (default on this node is
	       current 'hostname')
  --range=S        the range of time (in seconds) in which the execution of
	       the job must be unique (default is 300 seconds)
  --s3=URL         the optional S3 path to put the output of the job in
	       s3://BUCKET[/PATH] format
  --log=FILE       the local filename to use for logs
  -d, --debug      print debug information