data-curator icon indicating copy to clipboard operation
data-curator copied to clipboard

Monitor data.qld.gov.au for #318 fix

Open Stephen-Gates opened this issue 6 years ago • 2 comments

Monitor for changes and when implemented check that #318 is resolved

Stephen-Gates avatar Mar 16 '18 10:03 Stephen-Gates

portal support report

I think the issue here is with encoding for Unicode characters and Python. In this case em dashes.

https://stackoverflow.com/questions/11674850/issue-with-content-in-urllib2-urlopen That's an "em dash", which is Unicode code point U+2014 and is encoded in Windows-1252 as 0x97 (but it is not part of ISO 8859-1).

This is the documentation on Unicode for CKAN - http://docs.ckan.org/en/ckan-2.7.3/contributing/unicode.html

Other CKAN discussions around Unicode issues - https://github.com/ckan/ckan/issues/3006

I’m afraid there’s no simple solution for this, at least from my end. I’ll keep investigating for now.

I have some simple python code that makes request to a server

html_page = urllib2.urlopen(baseurl, timeout=20) print html_page.read() html_page.close() when i am trying to scrape a page that has ...

Stephen-Gates avatar Apr 19 '18 20:04 Stephen-Gates

Just FYI so you don't spend unnecessary time on this - I would assume there will be no changes to data.qld.gov.au to support this for milestone 0.18.0

louisjasek avatar May 18 '18 01:05 louisjasek