django-calaccess-raw-data
django-calaccess-raw-data copied to clipboard
Log of instances where CAL-ACCESS zip_file download fails
Today was the first time we noticed issues accessing the CAL-ACCESS zip file, which has typically been updated at http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip every day.
I've been trying it all day and have gotten either of the following errors each time:
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))zipfile.BadZipfile: File is not a zip file
In the case of the latter, it will seem like it's starting to download before throwing the error. I have also pointed Chrome at this url, and will see the file start to download, then stop abruptly. The size of the resulting file varies, but regardless I cannot open the zip file.

I experienced the same error.
Update on this.
I check on Sat, Jan 30, and was able to completely download that zip file. But again this morning, I am getting the same zipfile.BadZipfile: File is not a zip file error. And once again, when I point Chrome at http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip, the file seems like it starts to download, then stops abruptly, and I cannot open the zip file.

More trouble-shooting details:
- Sending an HEAD request to the zip file URL results in a successful response (i.e., HTTP response status code 200).
- Based on the
Last-Modifiedvalue in the response headers, it seems like this should be a new file posted today. Here's all the headers:
{
'Content-Length': '764145684',
'Accept-Ranges': 'bytes',
'Expires': 'Mon, 01 Feb 2016 17:51:17 GMT',
'Last-Modified': 'Mon, 01 Feb 2016 12:20:05 GMT',
'Connection': 'keep-alive',
'ETag': '41cf48801c8a49e387987e925ee4137d',
'X-Timestamp': '1454329204.65535',
'Cache-Control': 'public,
max-age=541',
'X-Trans-Id': 'tx4595828ccf3d4004b55b2-0056af9791dfw1',
'Date': 'Mon,
01 Feb 2016 17:42:16 GMT',
'Content-Type': 'application/zip',
'X-Object-Meta-Screenie': '41cf48801c8a49e387987e925ee4137d'
}
Today I was only able to download 6 of 750 MBs before getting the zipfile.BadZipfile: File is not a zip file error. On the second attempt, I got a requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) error.
:disappointed:
Here are the contents of today's response header:
{
'Content-Length': '768221570',
'Accept-Ranges': 'bytes',
'Expires': 'Tue,
02 Feb 2016 17:41:36 GMT',
'Last-Modified': 'Tue,
02 Feb 2016 12:20:06 GMT',
'Connection': 'keep-alive',
'ETag': 'b88dcc617780ff1eb674008f66be552f',
'X-Timestamp': '1454415605.30877',
'Cache-Control': 'public,
max-age=900',
'X-Trans-Id': 'tx8648e86097ac484c954c9-0056b0e6ccdfw1',
'Date': 'Tue,
02 Feb 2016 17:26:36 GMT',
'Content-Type': 'application/zip',
'X-Object-Meta-Screenie': 'b88dcc617780ff1eb674008f66be552f'
}
How do we normally handle errors like this? Do we just throw the error retuned? I imagine this'll happen again so I'm curious what would be the best approach for dealing with this issue.
FWIW, I was able to download the zip just now

The zip file became available around 2:30 PST this afternoon.
@aboutaaron We could definitely throw a more explicit error, and encourage the user to try pointing their web browser at the url, contacting the SoS directly, trolling them on Twitter, etc.
I encountered this same error earlier, but was able to download the file later in the day -- consistent with @gordonje's experience.
Welp, not working this morning.
First attempt only downloads 102 of 752 MBs, then I get the zipfile.BadZipfile: File is not a zip file error.
Second attempt throws requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)).
And based on the header, this should be a new file posted early this morning at the usual time:
{
'Content-Length': '769605861',
'Accept-Ranges': 'bytes',
'Expires': 'Wed, 03 Feb 2016 18:20:24 GMT',
'Last-Modified': 'Wed, 03 Feb 2016 12:20:06 GMT',
'Connection': 'keep-alive',
'ETag': '9080338c23b240c5bb0905218e67269b',
'X-Timestamp': '1454502005.16910',
'Cache-Control': 'public, max-age=900',
'X-Trans-Id': 'tx31ae0f2984744a88874c0-0056b24164dfw1',
'Date': 'Wed, 03 Feb 2016 18:05:24 GMT',
'Content-Type': 'application/zip',
'X-Object-Meta-Screenie': '9080338c23b240c5bb0905218e67269b'
}
Will try again later this afternoon.
Well, that sucks haha. I wonder what's happening on their end
On Feb 3, 2016, 1:09 PM -0500, James [email protected], wrote:
Welp, not working this morning.
First attempt only downloads 102 of 752 MBs, then I get thezipfile.BadZipfile: File is not a zip fileerror.
Second attempt throwsrequests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)).
And based on the header, this should be a new file posted early this morning at the usual time:
{ 'Content-Length': '769605861', 'Accept-Ranges': 'bytes', 'Expires': 'Wed, 03 Feb 2016 18:20:24 GMT', 'Last-Modified': 'Wed, 03 Feb 2016 12:20:06 GMT', 'Connection': 'keep-alive', 'ETag': '9080338c23b240c5bb0905218e67269b', 'X-Timestamp': '1454502005.16910', 'Cache-Control': 'public, max-age=900', 'X-Trans-Id': 'tx31ae0f2984744a88874c0-0056b24164dfw1', 'Date': 'Wed, 03 Feb 2016 18:05:24 GMT', 'Content-Type': 'application/zip', 'X-Object-Meta-Screenie': '9080338c23b240c5bb0905218e67269b' }
Will try again later this afternoon.
— Reply to this email directly orview it on GitHub(https://github.com/california-civic-data-coalition/django-calaccess-raw-data/issues/1195#issuecomment-179382725).
Do we have a contact there, or do they have a contact page? Seems worthwhile to reach out, since it's an ongoing issue.
I was able to do download a file yesterday evening. Looks like it went up at 6:21 PM GMT. There was another file posted this morning at 12:20 GMT, and I was able to download and unzip.
Even still, because of the recent intermittent service and because we're on the cusp of being able to deploy the raw-data app somewhere it will run on daily, I think it's still worth reaching out to them, if only to ask "When should we schedule our requests for the new zip files?"
Down again today. Same deal with the zipfile.BadZipfile or requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",)) error.
Looks like it should be a new file for today.
{
'Content-Length': '769895271',
'Accept-Ranges': 'bytes',
'Expires': 'Mon, 08 Feb 2016 20:36:03 GMT',
'Last-Modified': 'Mon, 08 Feb 2016 12:20:05 GMT',
'Connection': 'keep-alive',
'ETag': 'c4f13c4c272ef4bc18a6831735c0385c',
'X-Timestamp': '1454934004.50330',
'Cache-Control': 'public, max-age=900',
'X-Trans-Id': 'tx8dc685337924407087783-0056b8f8abdfw1',
'Date': 'Mon, 08 Feb 2016 20:21:03 GMT',
'Content-Type': 'application/zip',
'X-Object-Meta-Screenie': 'c4f13c4c272ef4bc18a6831735c0385c'
}
I had the same error moments ago when then the file truncated long before it was finished downloading.

It's working this morning. The current file was posted at 14:51:45 GMT, a little later than what has been typical.
Not working this morning. Two attempts, both with truncated zip files.
Looks a new one was posted at the regular time:
{
'Content-Length': '770040404',
'Accept-Ranges': 'bytes',
'Expires': 'Wed, 10 Feb 2016 18:14:13 GMT',
'Last-Modified': 'Wed, 10 Feb 2016 12:20:06 GMT',
'Connection': 'keep-alive',
'ETag': '7892029f45e4b1dbba0253f9fa0658c4',
'X-Timestamp': '1455106805.44704',
'Cache-Control': 'public, max-age=865',
'X-Trans-Id': 'tx2c18e38be9f04247bd0de-0056bb7a94dfw1',
'Date': 'Wed, 10 Feb 2016 17:59:48 GMT',
'Content-Type': 'application/zip',
'X-Object-Meta-Screenie': '7892029f45e4b1dbba0253f9fa0658c4'
}
My call to the Secretary of State's office today about this issue was ignored.
I am having the error again at home tonight.
ben@bunkerhill ~/Code/django-calaccess-raw-data
% python example/manage.py updatecalaccessrawdata
The currently available CAL-ACCESS snapshot was released by the California Secretary of State on Feb. 10, 2016, at 4:20 a.m. Pacific Time.
It is 734M in size. You downloaded up to 20M as of 4 minutes ago.
Do you want to download it to /home/ben/Code/django-calaccess-raw-data/example/data and update your local database?
Type 'yes' to do it, or 'no' to back out:
yes
Downloading ZIP file
[################################] 36864/731513 - 00:01:07
Unzipping archive
Traceback (most recent call last):
File "example/manage.py", line 9, in <module>
execute_from_command_line(sys.argv)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
utility.execute()
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
output = self.handle(*args, **options)
File "calaccess_raw/management/commands/updatecalaccessrawdata.py", line 159, in handle
noinput=options['noinput'],
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 119, in call_command
return command.execute(*args, **defaults)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
output = self.handle(*args, **options)
File "calaccess_raw/management/commands/downloadcalaccessrawdata.py", line 146, in handle
self.unzip()
File "calaccess_raw/management/commands/downloadcalaccessrawdata.py", line 236, in unzip
with zipfile.ZipFile(self.zip_path) as zf:
File "/usr/lib/python2.7/zipfile.py", line 770, in __init__
self._RealGetContents()
File "/usr/lib/python2.7/zipfile.py", line 811, in _RealGetContents
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
The ZIP was not available to me this morning.
It seemed that the entire Secretary of State website is down when I try to visit in my browser from my home wifi. That was confirmed by just-ping.com, which returned the following results.

Here is the error I drew during my download attempt with our software.
% python example/manage.py updatecalaccessrawdata
Traceback (most recent call last):
File "example/manage.py", line 9, in <module>
execute_from_command_line(sys.argv)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 353, in execute_from_command_line
utility.execute()
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/__init__.py", line 345, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 348, in run_from_argv
self.execute(*args, **cmd_options)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/django/core/management/base.py", line 399, in execute
output = self.handle(*args, **options)
File "calaccess_raw/management/commands/updatecalaccessrawdata.py", line 120, in handle
download_metadata = self.get_download_metadata()
File "calaccess_raw/management/commands/__init__.py", line 54, in get_download_metadata
request = requests.head(self.url)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/api.py", line 93, in head
return request('head', url, **kwargs)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/api.py", line 53, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/sessions.py", line 468, in request
resp = self.send(prep, **send_kwargs)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/home/ben/.virtualenvs/django-calaccess-raw-data/local/lib/python2.7/site-packages/requests/adapters.py", line 437, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='campaignfinance.cdn.sos.ca.gov', port=80): Max retries exceeded with url: /dbwebexport.zip (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f2ee37343d0>: Failed to establish a new connection: [Errno -2] Name or service not known',))
It is working for me now. I just downloaded the zip with a Python script.
@ssdatar We're sure this issue is entirely on the SoS's end. It's also intermittent and unpredictable. @palewire recently got some inside info suggesting it's a result of some monthly internal CAL-ACCESS processes, which he might want to elaborate on.
But we are keeping this ticket open in order to document any instance when the file is not available so that we can refer responsible parties to the relevant details. Other than that, there's nothing to do on this one.
The bug label might be a little confusing for this ticket since the issue is largely out of our control. Gonna remove it now unless anyone objects.
On Saturday August 6, 2016, we had a ZIP download fail. We don't know why. An attempt to resume it on Saturday failed. We then deleted the partial download and started from scratch that work.
It happened again the next day and a full wipe and redownload worked.
As of this morning July 5, 2017, the CAL-ACCESS bulk download has not updated in five days since June 30, 2017.
$ date
Wed Jul 5 12:07:58 PDT 2017
$ curl -I HEAD http://campaignfinance.cdn.sos.ca.gov/dbwebexport.zip
HTTP/1.1 200 OK
Server: Apache/2.2.3 (Red Hat)
Last-Modified: Fri, 30 Jun 2017 11:20:28 GMT
ETag: "2320c8-305b5d54-9ab7f700"
Accept-Ranges: bytes
Content-Length: 811294036
Content-Type: application/zip
Date: Wed, 05 Jul 2017 19:08:01 GMT
Connection: keep-alive