py-openaq icon indicating copy to clipboard operation
py-openaq copied to clipboard

Python2.7 and encoding of city names

Open dhhagan opened this issue 7 years ago • 9 comments

Seems to be something wonky with v1.0.0 and python2.7 (look at cities in Chile)

dhhagan avatar Feb 03 '17 23:02 dhhagan

David: this does NOT generate problems with 2.7

`from openaq import OpenAQ coldict = {'coordinates.latitude':'lat', 'coordinates.longitude': 'lon'} api = OpenAQ()

df = api.measurements(country='CL', limit=10000, df=True).rename(columns=coldict) out = {};stats = {}

for loc, data in df.groupby(['city', 'location', 'parameter']): data = data.resample('1h').mean() out[loc] = data stats[loc] = (len(data), min(data.value), max(data.value))`

sergiolucero avatar Feb 07 '17 15:02 sergiolucero

@sergiolucero Interesting. Can you provide me with an example where it does fail to properly encode them?

dhhagan avatar Feb 07 '17 18:02 dhhagan

This will produce an error when the location has an accent (Estación Centro):

from openaq import OpenAQ api=OpenAQ() df=api.measurements(country='CL',city='Calama',df=True,limit=100) for loc, data in df.groupby(['city','location']): print loc dfloc = api.latest(city=loc[0],location=loc[1])

sergiolucero avatar Feb 07 '17 18:02 sergiolucero

Ahh okay. So the first one didn't raise an error just by chance...since there were no accented cities?

dhhagan avatar Feb 07 '17 18:02 dhhagan

No, somehow requesting again exposes the encoding problem?

sergiolucero avatar Feb 07 '17 19:02 sergiolucero

Hmm. Okay. That seems like it's probably on the end of the OpenAQ API then?

dhhagan avatar Feb 07 '17 20:02 dhhagan

@dhhagan It's a known issue on the API, see openaq/openaq-api#275

dolugen avatar Sep 29 '17 02:09 dolugen

@dolugen Ahh good to know. I began to make a spreadsheet of all the offending instances, but once it reached a few hundred I gave up :/ It would certainly be nice to fix though! Hopefully, someone has time to tackle it this October.

dhhagan avatar Sep 29 '17 02:09 dhhagan

I'm not sure if this'll be fixed in OpenAQ. Looking over the issue history, looks like it's an issue with the source. So if we're getting bad characters from the source, I think we're just capturing them. Maybe we could always check for UTF-8 and discard items that don't pass?

jflasher avatar Sep 29 '17 18:09 jflasher