docker-at-oreilly icon indicating copy to clipboard operation
docker-at-oreilly copied to clipboard

An overview of experiments O'Reilly Media is doing with Docker

Docker Experiments at O'Reilly Media

Docker Boston Meetup

July 30, 2014

Andrew Odewahn ([email protected])

This presentation is at http://bit.ly/docker-at-oreilly

About O'Reilly Media

  • O'Reilly Media produces technical books and events
  • Both "Web 2.0" and "Open Source" were terms that came from O'Reilly events
  • Founded in Cambridge, MA, but now headquartered in Sebastopol, CA (north of San Francisco)

At heart, O'Reilly is a learning company

"Spreading the knowledge of innovators" -- O'Reilly exists to take the knowledge in an experts head and package it up so that other people can learn it.

The way people want to learn is changing radically.

  • How do we create media products like these using our core capabilities (editorial, brand, community)
  • How do we transform as media is increasingly becoming software
  • Exploring Docker to help us make new kinds of media products

How do we respond to demand for IPython Notebooks

  • IPython Notebooks are becoming the defacto tool in the scientific and big data science communities
  • Provides authoring and execution environment for text, math, and arbitrary code (Python, Julia, R, Ruby, and more)
  • Strong demand among our authors to support this format
  • Plus, it's awesome

How we're using Docker to...

  • help authors create them
  • produce them (edit, copyedit, illustrate, index, etc)
  • distribute them to make a compelling experience?

Experiment 1: Packaging the examples for Python for Data Analysis as a Docker image

  • Successful book in the "Data Science Area" published in 2012
  • This is a rapidly changing area
  • Create a companion product as an IPython Notebook

DEMO

The key steps are

VBoxManage controlvm boot2docker-vm natpf1 "ipython-notebook,tcp,127.0.0.1,8888,,8888"
sudo docker pull odewahn/python-data-analysis
  • Start the container, and be sure to expose port 8888
sudo docker run -i -t -p 8888:8888 odewahn/python-data-analysis /bin/bash
  • Once the container starts and your at the bash prompt, start the server with this command:
./start.sh
  • Go to localhost:8888 on your local browser

How do we go beyond companion pieces and make actual products?

  • Companion products are great, but how do we make actual products themselves?
  • We use an internally developed tool call O'Reilly Atlas for 80% of our content.

Atlas has 3 core concepts

A single source of semantically rich content

Version control in Git

  • All Atlas content is stored in Git.
  • This presentation was created in Atlas and posted to Github

Transformation engines to create formats for consumption

Experiment 2: Just Enough Math

  • A combination book, video series, and tutorial
  • Delivered as an IPython Notebook created in Atlas

The project was written and produced in Atlas

  • Code samples that are tagged as "Executable" will be runnable in the browser

An Atlas to IPython Notebook conversion gem

  • The atlas2ipynb gem gem transform HTMLBook into IPython Notebook's JSON-based format

A Dockerfile for the base image with IPython Notebooks and the atlas2ipynb toolchain pre-installed

FROM ubuntu
MAINTAINER Andrew Odewahn "[email protected]"

RUN apt-get update
RUN apt-get install -y ruby1.9.3
RUN apt-get install -y python-software-properties python-dev python-pip
RUN apt-get install -y libfreetype6-dev libpng-dev libncurses5-dev vim git-core build-essential curl unzip wget

# Install Atlas-specific gems
RUN gem install bundler atlas-api atlas2ipynb

# Install ipython notebook requirements
RUN pip install --upgrade pip
ADD requirements.txt /tmp/requirements.txt
RUN pip install numpy==1.7.1
RUN pip install -r /tmp/requirements.txt --allow-unverified matplotlib --allow-all-external

#
# Create the command to actually run the ipython notebook
#
RUN adduser --disabled-password --home=/home/atlas --gecos "" atlas
USER atlas
WORKDIR /home/atlas
RUN echo '#!/bin/sh' > start.sh
RUN echo 'ipython notebook --ip=0.0.0.0 --port=8888 --pylab=inline --no-browser'  >> start.sh
RUN chmod +x start.sh

#
# Set us back to the root user
#
USER root

A Dockerfile for Just Enough Math (or any book, for that matter)

FROM odewahn/atlas-base
MAINTAINER Andrew Odewahn "[email protected]" 

#
# Install systemwide requirements
#

RUN apt-get install -y libatlas-base-dev 
RUN apt-get install -y gfortran 
RUN apt-get install -y gcc-multilib
RUN apt-get install -y lynx 
RUN apt-get install -y emacs23-nox 
RUN apt-get install -y glpk 
RUN apt-get install -y python-glpk


#
# Install python packages using pip
#
RUN pip install scipy
RUN pip install neurolab
RUN pip install hyperloglog                                      
RUN pip install countminsketch
RUN pip install pybloom               
RUN pip install lshash


#
# Install content using atlas-api to build the project
# Be sure to set ATLAS_KEY as an environment variable!
#   export ATLAS_KEY=<your atlas API key>
#
USER atlas
WORKDIR /home/atlas
RUN atlas2ipynb $ATLAS_KEY odewahn/jem-docker

"docker push" is the new publishing

docker build --tag odewahn/jem-tutorial .
docker push odewahn/jem-tutorial

DEMO

sudo docker pull odewahn/jem-tutorial
  • Start the container, and be sure to expose port 8888
sudo docker run -i -t -p 8888:8888 odewahn/jem-tutorial /bin/bash
  • Once the container starts and you're at the bash prompt, start the server with this command:
./start.sh
  • Go to localhost:8888 on your local browser

This experience leaves a lot to be desired

  • One of the first projects was "Kids Code," which teaches kids about Python
  • "OK kids, let's fire up an Ubuntu Virtual Machine and do some coding!" doesn't work well
  • Even for pros, this is a bit intimidating
    • VMs and Vagrant are unfamiliar
    • Windows does not include an SSH client...

Experiment #3: Towards a more seamless experience

  • O'Reilly Pyxie is a place where authors can put Docker images for distribution
  • Inspired by Nick Stinemates any-sass project
    • Frontend app starts a container based on an image you choose
    • Container is mapped to a URL using Hipache and returned to the user
    • User runs the container by going to the URL
  • Super-duper pre-alpha proof of concept

DEMO -- Pyxie.io

Lots of caveats

  • Scalability is a HUGE issue
  • Exploring many solutions for hosting images
  • Security issues in running untrusted code
  • Persistence and state
  • Skills -- finding people who are familiar with these tools is challenging

For more Info

A quick Survey

Questions / Comments