openstates-scrapers icon indicating copy to clipboard operation
openstates-scrapers copied to clipboard

MA Bill Scraper

Open NewAgeAirbender opened this issue 2 years ago • 4 comments

Updates MA scraper to pull from API instead of legislative site. Should work, haven't finished a scrape successfully locally because the API cuts out around an hour into the scrape as of 12-20-2021. Have spot-checked the bill data and it looks good, haven't gotten a vote event to save to check so I'm hesitant to say that's working, but noticed votes aren't working on the scraper as it currently stands either, so could remove and return to later. House vote details aren't available through the API, so if we wanted those details would have to add a pdf page process.

Did also see that theres a pr for scraping docket numbers, can add that easily to the Bill from the initial BillDetail process_item since the number is available on that detail page, but wasn't sure exactly where that information should be added to the Bill.

NewAgeAirbender avatar Jan 03 '22 22:01 NewAgeAirbender

When I try to run this I am getting:

  File "/Users/james/Library/Caches/pypoetry/virtualenvs/openstates-scrapers-93h4Z7Fx-py3.9/lib/python3.9/site-packages/spatula/pages.py", line 186, in _to_items
    result = self.process_page()
TypeError: 'generator' object is not callable

I think related to the @property decorator on line 61, I can remove that and get it to run but wanted to check on what the intent was there and if there are perhaps other unpushed changes before I give this a full workout

jamesturk avatar Jan 05 '22 15:01 jamesturk

Also getting this when I try to run w/ some modifications:

    result = "pass" if self.input.yes > self.input.no else "fail"                                                             
AttributeError: 'PartialBill' object has no attribute 'yes'                                                          

jamesturk avatar Jan 05 '22 16:01 jamesturk

Moving the vote part to a different branch to break up the ticket. Still haven't gotten a successful full run, since I get this error, but spot checking the bills that do save looks good.

raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

NewAgeAirbender avatar Jan 17 '22 19:01 NewAgeAirbender

Pushed a few changes up to this branch, but still haven't gotten it to complete.

A couple of things:

  • I added a way to reject a response from spatula, triggering the retry logic if the response is rejected. This allows us to treat non-JSON responses as errors. This got it to retry some failing requests.
  • Occasionally an actions response was coming back blank, I added a special case for that for now, but probably worth figuring out why as it'd wipe bill actions if the scrape succeeded. (Not a blocker IMO, but not great either.)
  • Right now you need to pass self to do_scrape if you're using bobsled with openstates-core. There's an example of this in this scraper now, but without that a lot of the scraper options don't propagate. I'd love to work on improving that workflow as we move more bill scrapers to spatula.

But, even with all this, sometimes the API just goes down. It 500s for several minutes and doesn't recover within 15 (I had it retrying that long at one point). That to me says that their API just isn't ready for the big show.

jamesturk avatar Feb 15 '22 19:02 jamesturk