Eric Mill issues

Results 227 issues of


Eric Mill

Senate committee contact info

We have phone, website, and office/address for House committees, but not Senate committees. The website, at least, seems present on Senate.gov. Example: http://www.senate.gov/general/committee_membership/committee_memberships_SSBK.htm

Handle YouTube channels better

The channel IDs don't obey the contract of being able to say `youtube.com/[ID]` to construct a valid URL. They only work at `youtube.com/channel/[ID]`. I don't know if YouTube worked this...

New Senate Periodical Press Gallery site, tons of resources

The Senate Periodical Press Gallery has revamped itself, and now offers a bunch of new resources: http://www.periodicalpress.senate.gov/ - What they've always offered: [to-the-minute Senate floor updates](http://www.periodicalpress.senate.gov/), though without any timestamps....

Historical committee name changes

The House' History site has a page dedicated to them: http://history.house.gov/Records-and-Research/FAQs/Committee-Names/ Thanks to @danielschuman for pointing this out.

Re-open ticket for a member if that member's form is broken

As reported by anyone integrated the data. The logic from an integrator should be - if it's closed, re-open it. If it's open, do nothing (I imagine Github will silently...

DOM monitor library may be helpful

This seems handy: https://github.com/fouber/page-monitor

Evaluate scraping oversight.gov for supported IGs

Though there's no bulk data and it doesn't cover every IG, oversight.gov at least has consistent HTML for many IGs. Though, we also may be interested in identifying differences or...

Check for PDF "attachments" with pdftk

Even if the USPS or DHS IGs don't have them, at least set up a process where if it does detect any, it emails the admin.

Data Improvement

Integrate OCRing where needed

I'm not sure the best path for detection of reports that need OCRing (perhaps through a flag set by the scraper), but we should have `tesseract` for OCRing of some...

GAO's own reports

Not the GAO IG, but the [GAO](http://www.gao.gov/) itself, who publishes an amazing number of excellent reports. There are four interesting datasets, with two known existing scrapers: - Reports, for which...