dmep-python-intro
dmep-python-intro copied to clipboard
Intro to Python for Text and Data
Instructor: Allison Parrish
The goal of this course is to introduce students to essential techniques in working with text and data and to enhance their literacy in the language and practice of contemporary computer programming. Students will learn the Python programming language from scratch, and work toward making a small final project that meaningfully contributes to their interests and practice. Covered topics: working on the command line, Jupyter Notebook, data formats (CSV and JSON), Pandas, web scraping, text analysis with Natural Language Processing, Git and Github.
Schedule
Session 1
- Introduction
- Text processing on the command line
- Suggested exercise: UNIX command line exercise
- Collaborate to make a list of data journalism, data science and text analysis projects that are meaningful and aspirational for you.
Session 2
- Review the list of aspirational projects
- Python: Expressions and strings
- What is plain text?
- Python: Writing Python programs
- Suggested exercise: Write a Python program that mimics a UNIX command line tool. (Or otherwise creatively analyse or modify a text.)
Session 3
- Jupyter Notebook tutorial
- Lists and loops
- Jupyter notebook from class
- Suggested exercise: Exercise A. (Download this Python file and make the modifications suggested in the comments until the output matches the suggestions.)
Session 4
- Dictionaries, sets and tuples
- Jupyter notebook we created in class
- Suggested exercise: Exercise B
Session 5
- Counting things
- Accessing Web APIs
- Jupyter notebook from class
- Suggested exercise: Web API Worksheet
Session 6
- Scraping HTML with Beautiful Soup
- Jupyter notebook from class
- Suggested exercise: Web Scraping Worksheet
Session 7
- Pandas for simple data analysis and visualization
- Jupyter notebook from class
- Suggested exercise: Pandas worksheet
Session 8
- Pandas, continued
- Jupyter notebook from class
- Suggested exercise: Continue the Pandas worksheet
Session 9
- Terms to know when talking about language
- Intro to NLP with spaCy (skipped!)
- Parsing and tagging Chinese with Jieba
- Understanding word vectors
- Experimental Chinese word vector notebook! Very incomplete! (Requires this file of 50k word vectors from FastText. Do not decompress!)
Session 10
- Project presentations