django-calaccess-raw-data icon indicating copy to clipboard operation
django-calaccess-raw-data copied to clipboard

Log of instances where CAL-ACCESS zip_file download fails

Open gordonje opened this issue 9 years ago • 24 comments

Today was the first time we noticed issues accessing the CAL-ACCESS zip file, which has typically been updated at http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip every day.

I've been trying it all day and have gotten either of the following errors each time:

  • requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
  • zipfile.BadZipfile: File is not a zip file

In the case of the latter, it will seem like it's starting to download before throwing the error. I have also pointed Chrome at this url, and will see the file start to download, then stop abruptly. The size of the resulting file varies, but regardless I cannot open the zip file. bad_zip

gordonje avatar Jan 29 '16 23:01 gordonje

I experienced the same error.

palewire avatar Jan 30 '16 14:01 palewire

Update on this.

I check on Sat, Jan 30, and was able to completely download that zip file. But again this morning, I am getting the same zipfile.BadZipfile: File is not a zip file error. And once again, when I point Chrome at http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip, the file seems like it starts to download, then stops abruptly, and I cannot open the zip file.

bad_zip2

gordonje avatar Feb 01 '16 17:02 gordonje

More trouble-shooting details:

  • Sending an HEAD request to the zip file URL results in a successful response (i.e., HTTP response status code 200).
  • Based on the Last-Modified value in the response headers, it seems like this should be a new file posted today. Here's all the headers:
{
    'Content-Length': '764145684',
    'Accept-Ranges': 'bytes',
    'Expires': 'Mon, 01 Feb 2016 17:51:17 GMT',
    'Last-Modified': 'Mon, 01 Feb 2016 12:20:05 GMT',
    'Connection': 'keep-alive',
    'ETag': '41cf48801c8a49e387987e925ee4137d',
    'X-Timestamp': '1454329204.65535',
    'Cache-Control': 'public,
    max-age=541',
    'X-Trans-Id': 'tx4595828ccf3d4004b55b2-0056af9791dfw1',
    'Date': 'Mon,
    01 Feb 2016 17:42:16 GMT',
    'Content-Type': 'application/zip',
    'X-Object-Meta-Screenie': '41cf48801c8a49e387987e925ee4137d'
}

gordonje avatar Feb 01 '16 17:02 gordonje

Today I was only able to download 6 of 750 MBs before getting the zipfile.BadZipfile: File is not a zip file error. On the second attempt, I got a requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) error.

:disappointed:

gordonje avatar Feb 02 '16 15:02 gordonje

Here are the contents of today's response header:

{
    'Content-Length': '768221570',
    'Accept-Ranges': 'bytes',
    'Expires': 'Tue,
    02 Feb 2016 17:41:36 GMT',
    'Last-Modified': 'Tue,
    02 Feb 2016 12:20:06 GMT',
    'Connection': 'keep-alive',
    'ETag': 'b88dcc617780ff1eb674008f66be552f',
    'X-Timestamp': '1454415605.30877',
    'Cache-Control': 'public,
    max-age=900',
    'X-Trans-Id': 'tx8648e86097ac484c954c9-0056b0e6ccdfw1',
    'Date': 'Tue,
    02 Feb 2016 17:26:36 GMT',
    'Content-Type': 'application/zip',
    'X-Object-Meta-Screenie': 'b88dcc617780ff1eb674008f66be552f'
}

gordonje avatar Feb 02 '16 17:02 gordonje

How do we normally handle errors like this? Do we just throw the error retuned? I imagine this'll happen again so I'm curious what would be the best approach for dealing with this issue.

aboutaaron avatar Feb 02 '16 22:02 aboutaaron

FWIW, I was able to download the zip just now

aboutaaron avatar Feb 02 '16 22:02 aboutaaron

good_news

The zip file became available around 2:30 PST this afternoon.

gordonje avatar Feb 02 '16 22:02 gordonje

@aboutaaron We could definitely throw a more explicit error, and encourage the user to try pointing their web browser at the url, contacting the SoS directly, trolling them on Twitter, etc.

gordonje avatar Feb 02 '16 22:02 gordonje

I encountered this same error earlier, but was able to download the file later in the day -- consistent with @gordonje's experience.

palewire avatar Feb 03 '16 00:02 palewire

Welp, not working this morning.

First attempt only downloads 102 of 752 MBs, then I get the zipfile.BadZipfile: File is not a zip file error.

Second attempt throws requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)).

And based on the header, this should be a new file posted early this morning at the usual time:

{
    'Content-Length': '769605861',
    'Accept-Ranges': 'bytes',
    'Expires': 'Wed, 03 Feb 2016 18:20:24 GMT',
    'Last-Modified': 'Wed, 03 Feb 2016 12:20:06 GMT',
    'Connection': 'keep-alive',
    'ETag': '9080338c23b240c5bb0905218e67269b',
    'X-Timestamp': '1454502005.16910',
    'Cache-Control': 'public, max-age=900',
    'X-Trans-Id': 'tx31ae0f2984744a88874c0-0056b24164dfw1',
    'Date': 'Wed, 03 Feb 2016 18:05:24 GMT',
    'Content-Type': 'application/zip',
    'X-Object-Meta-Screenie': '9080338c23b240c5bb0905218e67269b'
}

Will try again later this afternoon.

gordonje avatar Feb 03 '16 18:02 gordonje

Well, that sucks haha. I wonder what's happening on their end

On Feb 3, 2016, 1:09 PM -0500, James [email protected], wrote:

Welp, not working this morning.

First attempt only downloads 102 of 752 MBs, then I get thezipfile.BadZipfile: File is not a zip fileerror.

Second attempt throwsrequests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)).

And based on the header, this should be a new file posted early this morning at the usual time:

{ 'Content-Length': '769605861', 'Accept-Ranges': 'bytes', 'Expires': 'Wed, 03 Feb 2016 18:20:24 GMT', 'Last-Modified': 'Wed, 03 Feb 2016 12:20:06 GMT', 'Connection': 'keep-alive', 'ETag': '9080338c23b240c5bb0905218e67269b', 'X-Timestamp': '1454502005.16910', 'Cache-Control': 'public, max-age=900', 'X-Trans-Id': 'tx31ae0f2984744a88874c0-0056b24164dfw1', 'Date': 'Wed, 03 Feb 2016 18:05:24 GMT', 'Content-Type': 'application/zip', 'X-Object-Meta-Screenie': '9080338c23b240c5bb0905218e67269b' }

Will try again later this afternoon.

— Reply to this email directly orview it on GitHub(https://github.com/california-civic-data-coalition/django-calaccess-raw-data/issues/1195#issuecomment-179382725).

aboutaaron avatar Feb 03 '16 18:02 aboutaaron

Do we have a contact there, or do they have a contact page? Seems worthwhile to reach out, since it's an ongoing issue.

bcipolli avatar Feb 03 '16 18:02 bcipolli

I was able to do download a file yesterday evening. Looks like it went up at 6:21 PM GMT. There was another file posted this morning at 12:20 GMT, and I was able to download and unzip.

Even still, because of the recent intermittent service and because we're on the cusp of being able to deploy the raw-data app somewhere it will run on daily, I think it's still worth reaching out to them, if only to ask "When should we schedule our requests for the new zip files?"

gordonje avatar Feb 04 '16 16:02 gordonje

Down again today. Same deal with the zipfile.BadZipfile or requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) error.

Looks like it should be a new file for today.

{
    'Content-Length': '769895271',
    'Accept-Ranges': 'bytes',
    'Expires': 'Mon, 08 Feb 2016 20:36:03 GMT',
    'Last-Modified': 'Mon, 08 Feb 2016 12:20:05 GMT',
    'Connection': 'keep-alive',
    'ETag': 'c4f13c4c272ef4bc18a6831735c0385c',
    'X-Timestamp': '1454934004.50330',
    'Cache-Control': 'public, max-age=900',
    'X-Trans-Id': 'tx8dc685337924407087783-0056b8f8abdfw1',
    'Date': 'Mon, 08 Feb 2016 20:21:03 GMT',
    'Content-Type': 'application/zip',
    'X-Object-Meta-Screenie': 'c4f13c4c272ef4bc18a6831735c0385c'
}

gordonje avatar Feb 08 '16 20:02 gordonje

I had the same error moments ago when then the file truncated long before it was finished downloading.

screenshot from 2016-02-08 13 46 41

palewire avatar Feb 08 '16 21:02 palewire

It's working this morning. The current file was posted at 14:51:45 GMT, a little later than what has been typical.

gordonje avatar Feb 09 '16 15:02 gordonje

Not working this morning. Two attempts, both with truncated zip files.

bad_zip3 Looks a new one was posted at the regular time:

{
    'Content-Length': '770040404',
    'Accept-Ranges': 'bytes',
    'Expires': 'Wed, 10 Feb 2016 18:14:13 GMT',
    'Last-Modified': 'Wed, 10 Feb 2016 12:20:06 GMT',
    'Connection': 'keep-alive',
    'ETag': '7892029f45e4b1dbba0253f9fa0658c4',
    'X-Timestamp': '1455106805.44704',
    'Cache-Control': 'public, max-age=865',
    'X-Trans-Id': 'tx2c18e38be9f04247bd0de-0056bb7a94dfw1',
    'Date': 'Wed, 10 Feb 2016 17:59:48 GMT',
    'Content-Type': 'application/zip',
    'X-Object-Meta-Screenie': '7892029f45e4b1dbba0253f9fa0658c4'
}

gordonje avatar Feb 10 '16 18:02 gordonje

My call to the Secretary of State's office today about this issue was ignored.

I am having the error again at home tonight.

ben@bunkerhill ~/Code/django-calaccess-raw-data
 % python example/manage.py updatecalaccessrawdata


The currently available CAL-ACCESS snapshot was released by the California Secretary of State on Feb. 10, 2016, at 4:20 a.m. Pacific Time.

It is 734M in size. You downloaded up to 20M as of 4 minutes ago.

Do you want to download it to /home/ben/Code/django-calaccess-raw-data/example/data and update your local database?

Type 'yes' to do it, or 'no' to back out:
yes
Downloading ZIP file
[################################] 36864/731513 - 00:01:07
 Unzipping archive
Traceback (most recent call last):
  File "example/manage.py", line 9, in <module>
    execute_from_command_line(sys.argv)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "calaccess_raw/management/commands/updatecalaccessrawdata.py", line 159, in handle
    noinput=options['noinput'],
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 119, in call_command
    return command.execute(*args, **defaults)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "calaccess_raw/management/commands/downloadcalaccessrawdata.py", line 146, in handle
    self.unzip()
  File "calaccess_raw/management/commands/downloadcalaccessrawdata.py", line 236, in unzip
    with zipfile.ZipFile(self.zip_path) as zf:
  File "/usr/lib/python2.7/zipfile.py", line 770, in __init__
    self._RealGetContents()
  File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
    raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file

palewire avatar Feb 11 '16 06:02 palewire

The ZIP was not available to me this morning.

It seemed that the entire Secretary of State website is down when I try to visit in my browser from my home wifi. That was confirmed by just-ping.com, which returned the following results.

screenshot - 03202016 - 10 48 34 am

Here is the error I drew during my download attempt with our software.

 % python example/manage.py updatecalaccessrawdata
Traceback (most recent call last):
  File "example/manage.py", line 9, in <module>
    execute_from_command_line(sys.argv)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
    utility.execute()
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
    output = self.handle(*args, **options)
  File "calaccess_raw/management/commands/updatecalaccessrawdata.py", line 120, in handle
    download_metadata = self.get_download_metadata()
  File "calaccess_raw/management/commands/__init__.py", line 54, in get_download_metadata
    request = requests.head(self.url)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/api.py", line 93, in head
    return request('head', url, **kwargs)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
    r = adapter.send(request, **kwargs)
  File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/adapters.py", line 437, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='campaignfinance.cdn.sos.ca.gov', port=80): Max retries exceeded with url: /dbwebexport.zip (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2ee37343d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))

palewire avatar Mar 20 '16 17:03 palewire

It is working for me now. I just downloaded the zip with a Python script.

ssdatar avatar Apr 05 '16 17:04 ssdatar

@ssdatar We're sure this issue is entirely on the SoS's end. It's also intermittent and unpredictable. @palewire recently got some inside info suggesting it's a result of some monthly internal CAL-ACCESS processes, which he might want to elaborate on.

But we are keeping this ticket open in order to document any instance when the file is not available so that we can refer responsible parties to the relevant details. Other than that, there's nothing to do on this one.

The bug label might be a little confusing for this ticket since the issue is largely out of our control. Gonna remove it now unless anyone objects.

gordonje avatar Apr 06 '16 00:04 gordonje

On Saturday August 6, 2016, we had a ZIP download fail. We don't know why. An attempt to resume it on Saturday failed. We then deleted the partial download and started from scratch that work.

It happened again the next day and a full wipe and redownload worked.

palewire avatar Aug 08 '16 16:08 palewire

As of this morning July 5, 2017, the CAL-ACCESS bulk download has not updated in five days since June 30, 2017.

$ date
Wed Jul  5 12:07:58 PDT 2017
$ curl -I HEAD http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip
HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Fri, 30 Jun 2017 11:20:28 GMT
ETag: "2320c8-305b5d54-9ab7f700"
Accept-Ranges: bytes
Content-Length: 811294036
Content-Type: application/zip
Date: Wed, 05 Jul 2017 19:08:01 GMT
Connection: keep-alive

palewire avatar Jul 05 '17 19:07 palewire