tempest icon indicating copy to clipboard operation
tempest copied to clipboard

Like Apache Storm, but smaller and for Elixir

Tempest

Framework for distributed job topologies in Elixir (heavily influenced by Apache Storm).

The goal of this project is to be able to very easily create and run parallel/distributed job topologies. If you want something difficult, just use Apache Storm.

Installation

If available in Hex, the package can be installed as:

  1. Add tempest to your list of dependencies in mix.exs:
def deps do
  [{:tempest, "~> 0.0.1"}]
end
  1. Ensure tempest is started before your application:
def application do
  [applications: [:tempest]]
end

Quickstart

IMPORTANT: You need git-lfs installed to run the examples.

git clone https://github.com/cjbottaro/tempest.git
cd tempest
MIX_ENV=examples mix deps.get
MIX_ENV=examples mix compile
MIX_ENV=examples mix run examples/benchmark.exs
MIX_ENV=examples mix run examples/appender.exs /usr/share/dict/words
MIX_ENV=examples mix run examples/word_count.exs examples/data/big.txt
MIX_ENV=examples mix run examples/distributed_join.exs

Building your first topology

What is a topology? A topology describes a computation: a pipeline of jobs. It's a graph that specifies each job's dependencies and amount of parallelism.

So let's make a very simple topology that reads a file from the filesystem, then outputs each line with "!!" appended to it.

The topology will have three processors, linked together like this:

Processors are conceptually just functions. The input of a processor is a single tuple and the output is zero or more tuples.

When I say tuple, I really just mean any kind of struct, map, list, tuple, primitive type, etc. In Ruby or Python, it would just be an "object".

So let's build the reader processor. The input will be a file path and the output will be the lines in that file.

defmodule Reader do
  use Tempest.Processor

  def process(context, file_name) do
    stream = File.stream!(file_name, [:read])
    Enum.each stream, fn line ->
      line = String.rstrip(line)
      emit(context, line)
    end
  end
end

Pretty simple, huh?

Don't worry about context, it's like the opaque and ubiquitous conn in Phoenix.

Let's build the appender processor next.

defmodule Appender do
  use Tempest.Processor

  def process(context, line) do
    emit(context, "#{line}!!")
  end
end

Then the printer processor.

defmodule Printer do
  use Tempest.Processor

  def process(context, exclaimed_line) do
    IO.puts(exclaimed_line)
  end
end

Now let's build the topology.

alias Tempest.Topology

topology = Topology.new
  |> Topology.add_processor(:my_reader, Reader)
  |> Topology.add_processor(:the_appender, Appender, concurrency: 2)
  |> Topology.add_processor(:our_printer, Printer)

  |> Topology.add_link(:my_reader, :the_appender)
  |> Topology.add_link(:the_appender, :our_printer)

Notice that we name each processor in the calls to add_processor so that we can link them by name in the calls to add_link.

Also notice we passed concurrency: 2 to the Appender processor. That means it will run in two processes, and our topology really looks like this:

Now let's run the topology...

topology = topology
  |> Topology.begin_computation
  |> Topology.emit(:reader, "/usr/share/dict/words")
  |> Topology.end_computation

... and it should print out each line in /usr/share/dict/words with !! appended to them.

Statistics (finding bottlenecks in your topology)

To print out stats about your computation, just tack on...

topology
  |> Tempest.Stats.get
  |> Tempest.Stats.pretty_print

... and you should be able to analyze any bottlenecks in your computation.

TODO: explain the stats.

Stdlib of processors

The main goal of Tempest is ease of use; it ships with "a standard library of processors". A lot of topologies can be built without any custom processor definitions (including the README example everything in the examples/ dir), just use the processors in Tempest.ProcessorLib.

Goals (TODOs)

  • Distribute processors; currently all processors run in a single BEAM VM... :(
  • Rename "processors" to something more interesting.
  • A lot more error handling (enhance user experience).