Joshua Tauberer
Joshua Tauberer
The issues in this repo are best suited for bug reports related to the scripts in this repository (vs ideas). I'll leave the issue open for a while if anyone...
The thing you're proposing working on is about 1000x harder than documentation and CSV output. (I'm not exaggerating.) I don't know of (or can't think of) a better place to...
I think we'd all be fine with adding more columns if you want to submit a pull request. :)
I guess the question is, what happens if the committee membership scraper is run after? Maybe the thing to do is to revise _that_ scraper and have it skip members...
That was a file produced by GovTrack (me), from the data in this repository. The same information (but in a different format) is in this repository if you just scroll...
Forgot links: http://www.gpo.gov/fdsys/pkg/GPO-CDOC-108hdoc222/pdf/GPO-CDOC-108hdoc222-3-2.pdf http://www.gpo.gov/fdsys/pkg/GPO-CDOC-108hdoc222/pdf/GPO-CDOC-108hdoc222-3-70.pdf
We usually convert PDF to text first using `pdftotext` and then scrape it from there, so if you can do that and write the rest in Python that would be...
1: One-off, yes. I use "scraper" kind of broadly. But even with one-off things, it's useful to have it be reproducible (fully automated, traceable back to source data, etc.). 3a:...
Hey, This is a great start. The normalization into state, chamber, name, and note fields is great. It's still a bit of ways away from being something we can integrate...
I put in the pretty-print one: https://github.com/unitedstates/congress-legislators/commit/bdb520a727cf85a770670be92ffd896812514eda The next step would be to add bioguide IDs to each person in your file using e.g. the session start or end date...