Joshua Tauberer

Results 492 comments of Joshua Tauberer

The issues in this repo are best suited for bug reports related to the scripts in this repository (vs ideas). I'll leave the issue open for a while if anyone...

The thing you're proposing working on is about 1000x harder than documentation and CSV output. (I'm not exaggerating.) I don't know of (or can't think of) a better place to...

I think we'd all be fine with adding more columns if you want to submit a pull request. :)

I guess the question is, what happens if the committee membership scraper is run after? Maybe the thing to do is to revise _that_ scraper and have it skip members...

That was a file produced by GovTrack (me), from the data in this repository. The same information (but in a different format) is in this repository if you just scroll...

Forgot links: http://www.gpo.gov/fdsys/pkg/GPO-CDOC-108hdoc222/pdf/GPO-CDOC-108hdoc222-3-2.pdf http://www.gpo.gov/fdsys/pkg/GPO-CDOC-108hdoc222/pdf/GPO-CDOC-108hdoc222-3-70.pdf

We usually convert PDF to text first using `pdftotext` and then scrape it from there, so if you can do that and write the rest in Python that would be...

1: One-off, yes. I use "scraper" kind of broadly. But even with one-off things, it's useful to have it be reproducible (fully automated, traceable back to source data, etc.). 3a:...

Hey, This is a great start. The normalization into state, chamber, name, and note fields is great. It's still a bit of ways away from being something we can integrate...

I put in the pretty-print one: https://github.com/unitedstates/congress-legislators/commit/bdb520a727cf85a770670be92ffd896812514eda The next step would be to add bioguide IDs to each person in your file using e.g. the session start or end date...