openpolicedata icon indicating copy to clipboard operation
openpolicedata copied to clipboard

Chicago Pedestrian Stops data cannot be loaded

Open sowdm opened this issue 2 years ago • 5 comments

When Chicago Pedestrian stops data is loaded, a 403 Forbidden error is return. The solutions at the below links have already been tried and the robots.txt file for the website appears to be empty.

https://stackoverflow.com/questions/62278538/pd-read-csv-produces-httperror-http-error-403-forbidden https://stackoverflow.com/questions/54540901/why-am-i-getting-a-http-403-error-with-pandas

sowdm avatar Apr 20 '23 22:04 sowdm

This seems to work:

url = "https://home.chicagopolice.org/wp-content/uploads/2022-ISR.zip"
storage_options = {'User-Agent': 'Mozilla/5.0'} 
df = pd.read_csv(url, storage_options=storage_options)

UPDATE: Tried this again on work computer and it no longer works

sowdm avatar Apr 25 '23 18:04 sowdm

This seems to work:

url = "https://home.chicagopolice.org/wp-content/uploads/2022-ISR.zip"
storage_options = {'User-Agent': 'Mozilla/5.0'} 
df = pd.read_csv(url, storage_options=storage_options)

This does not work on my home computer

sowdm avatar Apr 25 '23 23:04 sowdm

https://home.chicagopolice.org/robots.txt results in a blank page. I would assume that this means that nothing is disallowed

sowdm avatar Apr 27 '23 13:04 sowdm

This might be worth testing: https://stackoverflow.com/questions/16627227/problem-http-error-403-in-python-3-web-scraping

sowdm avatar Apr 27 '23 16:04 sowdm

All data has been successfully added except 2019 data.

2019 data has multiple CSV files in it which pandas does not handle

> url = 'https://home.chicagopolice.org/wp-content/uploads/2019-ISR.zip'
> storage_options = {'User-Agent': 'Mozilla/5.0'} 
> table = pd.read_csv(url, encoding_errors='surrogateescape', storage_options=storage_options)
Exception has occurred: ValueError
Multiple files found in ZIP file. Only one file per ZIP: ['2019-ISR-Jan-Jun.csv', '2019-ISR-Jul-Dec.csv']

sowdm avatar Sep 27 '23 21:09 sowdm