text-as-data icon indicating copy to clipboard operation
text-as-data copied to clipboard

A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.

This repository contains the code accompanying my talk, "How does text become data?".

The "src" directory contains the actual code. The .ipynb files are the interactive IPython notebooks. The .py versions of the files will run in ordinary Python, and just have oddly-formatted comments from IPython that you can disregard.

Running the code

  1. Use virtualenv: virtualenv-2.7 venv . venv/bin/activate
  2. Install the required python libraries: pip install -r requirements.txt
  3. Download the NLTK input data: ipython import nltk nltk.download()