py-openaq
py-openaq copied to clipboard
Python2.7 and encoding of city names
Seems to be something wonky with v1.0.0 and python2.7 (look at cities in Chile)
David: this does NOT generate problems with 2.7
`from openaq import OpenAQ coldict = {'coordinates.latitude':'lat', 'coordinates.longitude': 'lon'} api = OpenAQ()
df = api.measurements(country='CL', limit=10000, df=True).rename(columns=coldict) out = {};stats = {}
for loc, data in df.groupby(['city', 'location', 'parameter']): data = data.resample('1h').mean() out[loc] = data stats[loc] = (len(data), min(data.value), max(data.value))`
@sergiolucero Interesting. Can you provide me with an example where it does fail to properly encode them?
This will produce an error when the location has an accent (Estación Centro):
from openaq import OpenAQ
api=OpenAQ()
df=api.measurements(country='CL',city='Calama',df=True,limit=100)
for loc, data in df.groupby(['city','location']):
print loc
dfloc = api.latest(city=loc[0],location=loc[1])
Ahh okay. So the first one didn't raise an error just by chance...since there were no accented cities?
No, somehow requesting again exposes the encoding problem?
Hmm. Okay. That seems like it's probably on the end of the OpenAQ API then?
@dhhagan It's a known issue on the API, see openaq/openaq-api#275
@dolugen Ahh good to know. I began to make a spreadsheet of all the offending instances, but once it reached a few hundred I gave up :/ It would certainly be nice to fix though! Hopefully, someone has time to tackle it this October.
I'm not sure if this'll be fixed in OpenAQ. Looking over the issue history, looks like it's an issue with the source. So if we're getting bad characters from the source, I think we're just capturing them. Maybe we could always check for UTF-8 and discard items that don't pass?