py-organ icon indicating copy to clipboard operation
py-organ copied to clipboard

A CSV data digester and organizer

Organ

Organ is a tabular data digester and organizer. You can install the Python module and command-line tools with either easy_install or pip:

$ [sudo] easy_install organ
# or
$ [sudo] pip install organ

Or clone the repo and run the setup.py script:

$ python setup.py install

csvfilter

A tool for performing map and filter operations on CSV data:

Usage: csvfilter [options] [--filter <FILTER>] [--map <MAP>] [<CSV>]

Options:
  -h, --help            show this help message and exit
  -F FILTER_EXPR, --filter=FILTER_EXPR
                        An optional Python expression by which to filter input
                        data, evaluated with each row's keys as local
                        variables, e.g. "DayOfWeek == 'Monday'"
  -m MAP_EXPR, --map=MAP_EXPR
                        An expression describing which keys to write to the
                        output and, for each, an optional expression to
                        evaluate. This is like a SQL SELECT clause, except
                        with "=" instead of the "AS" keyword: "foo=Foo"
                        lowercases the "Foo" column and excludes other
                        columns; "*,date=DateTime[0:10]" copies all columns
                        and creates a new "date" column containing the first
                        10 chars of the "DateTime" column.
  -d DIALECT, --dialect=DIALECT
                        The CSV dialect used to read and write data files, per
                        Python's csv module (default: "excel")

csvorganize

Usage: 
    csvorganize [options] (--key | -k) <KEY> [<CSV>]
    csvorganize [options] (--key-expr | -K) <KEY EXPRESSION> [<CSV>]

csvorganize takes a single CSV filename (or reads CSV from stdin) and runs a
"key function" on each row to generate a filesystem path to which the row
should be written, grouping rows with the same "key" into smaller collections.
Some examples:

Organize a CSV file with Year, Month and Date columns into nested
subdirectories of CSV data:

    csvorganize --key "{Year}/{Month}/{Date}" path/to/dates.csv

Classify geographic statistics, e.g. in a table that contains rows for
states, cities and zip codes:

    csvorganize --filter "Region == 'State'"  --key "states/{State}"
    csvorganize --filter "Region == 'City'"   --key "states/{State}/cities"
    csvorganize --filter "Region == 'Zip'"    --key "states/{State}/cities/{City}/zips"

Options:
  -h, --help            show this help message and exit
  -k KEY, --key=KEY     The key template string, with interpolated keys
                        wrapped in {}, e.g. "states/{State}"
  -K KEY_EXPR, --key-expr=KEY_EXPR
                        Alternatively, you can provide a key expression, which
                        is evaluated as Python code with the row's keys as
                        local variables, e.g. "State[0:2].upper()"
  -f FILENAME, --filename=FILENAME
                        The format of the output filename for each unique
                        key's values. This should be an Python formatting
                        string in which %s is replaced with the key, e.g.
                        "%s.txt" (default: "%s.csv")
  -d DIALECT, --dialect=DIALECT
                        The CSV dialect used to read and write data files, per
                        Python's csv module (default: "excel")
  -F FILTER_EXPR, --filter=FILTER_EXPR
                        An optional Python expression by which to filter input
                        data, evaluated with each row's keys as local
                        variables, e.g. "DayOfWeek == 'Monday'"
  -m MAP_EXPR, --map=MAP_EXPR
                        An expression describing which keys to write to the
                        output and, for each, an optional expression to
                        evaluate. This is like a SQL SELECT clause, except
                        with "=" instead of the "AS" keyword: "foo=Foo"
                        lowercases the "Foo" column and excludes other
                        columns; "*,date=DateTime[0:10]" copies all columns
                        and creates a new "date" column containing the first
                        10 chars of the "DateTime" column.
  -e, --empty           By default we discard rows for which the key
                        expression evaluates to an empty value. Setting this
                        flag forces the inclusion of empty keys, which will
                        likely produce unusual filenames (".csv").
  -r, --readonly        Don't write any files; just report the filenames and
                        the number of rows that would be written to each.
  -s SORT_ROWS, --sort=SORT_ROWS
                        Sort rows in the output by an expression with optional
                        "-" (descending) or "+" (ascending, the default)
                        order. Like --filter, the rest of the expression is
                        evaluated with the row's keys as local variables.
  -S SORT_KEYS, --sort-keys=SORT_KEYS
                        Sort the output keys, either by key ("+key", "-key")
                        or size ("+length", "-length"). This affects only the
                        order in which data files are written (and reported),
                        not their contents.

import organ

The organ module provides a bunch of useful functions for working with data:

organ.expression(str)

Converts a Python expression into a function that can be called on a dict and evaluated with its keys as local variables. For example:

>>> test = organ.expression("foo > 1")
>>> test({'foo': 0})
False
>>> test({'foo': 2})
True

Organ expressions are really useful with filter() and map(). Think of them as a more powerful version of operator.itemgetter().

organ.map_expression(str)

Organ's map expressions are kind of like a SQL SELECT clause in Python. You provide a string in the format:

key [ = expression] [, key [ = expression]] +

That is, one or more key and optional =expression clauses, separated by commas. (Whitespace is ignored around keys and expressions.) For example:

>>> transform = organ.map_expression("foo = bar + 1")
>>> transform({'bar': 2})
{'foo': 3}
>>> transform = organ.map_expression("state = State[0:2], country = 'US'")
>>> transform({'State': 'California'})
{'state': 'CA', 'country': 'US'}

Organ map expressions are, obviously, pretty useful with map().

organ.templategetter(str)

The templategetter function acts kind of like a Mustache template, but treats single curly braces as placeholders. So:

>>> full_name = organ.templategetter("{first} {last}")
>>> full_name({'first': 'Joe', 'last': 'Blow'})
'Joe Blow'

organ.sorter(str)

Organ's sorter() generator takes an expression()-compatible string, optionally prefixed with a + (ascending, default) or - (descending) to define the sort order, and returns a sorting function that evaluates the expression for two values. So:

>>> sorter = organ.sorter("+last")
>>> sorted([{'last': 'Zeldman'}, {'last': 'Allen'}], sorter)
[
  {'last': 'Allen'},
  {'last': 'Zeldman'}
]

organ.ascending(a, b)

Returns 1 if a > b, -1 if a < b, otherwise 0.

organ.descending(a, b)

Returns -1 if a > b, 1 if a < b, otherwise 0.