covid19-data
covid19-data copied to clipboard
Rozbitý encoding v OpenData_Slovakia_CovidAutomat.csv
Dobrý deň, v súbore OpenData_Slovakia_CovidAutomat.csv sú názvy okresov trochu porozbíjané.
Napríklad okres Stará ?ubov?a alebo ?adca. V hex editore vidím všetky tie otáznikové znaky ako 3F (00111111), čo je naozaj otáznik v ASCII.
+1 for this issue.
The wrong encoding can pose issues when someone will be using the raw data and filter them since some of the special characters in names of some cities/villages are misinterpreted. In case of need, I can create a Python script which would just perform a simple search and replace automatically to the CSV file. But this would need to be started manually by the user which is not the best-case scenario.
@matejmisik if you (or your team), because of whatever reason, are unable to fix the data before uploading them here, on GitHub, please, let me know and I'll create a simple Python script to fix the data.
Hello,
it seems that encoding is still malformed and not easily readable. @neisor have you find a way to read this data automaticaly and reliably?
(English text follows)
Zdravím,
dovolil som si spraviť veľmi jednoduchý python skrip, ktorý daný súbor opravý. Snáď niekomu pomôže.
Hello,
I made a very simple python script, which can repair the broken file. I hope that it can help someone.
My solution is to convert OpenData_Slovakia_CovidAutomat.xlsx to csv through cloudconvert service. It's free and works perfectly.
# coding=UTF-8
import cloudconvert
api_key = 'XXXXXXX'
sandbox = False
cloudconvert.configure(api_key = api_key,sandbox = sandbox)
result = cloudconvert.Job.create(payload={
"tasks": {
'import-covid-data': {
'operation': 'import/url',
'url': 'https://github.com/Institut-Zdravotnych-Analyz/covid19-data/raw/main/OpenData_Slovakia_CovidAutomat.xlsx',
'filename': 'OpenData_Slovakia_CovidAutomat.xlsx'
},
'convert-covid-data': {
'operation': 'convert',
'input': 'import-covid-data',
'output_format': 'csv',
'some_other_option': 'value'
},
'export-covid-data': {
'operation': 'export/url',
'input': 'convert-covid-data',
'inline': False,
'archive_multiple_files': False
}
}
})
exported_url_task_id = result['tasks'][2]['id']
res = cloudconvert.Task.wait(id=exported_url_task_id) # Wait for job completion
file = res.get("result").get("files")[0]
res = cloudconvert.download(filename=file['filename'], url=file['url'])
Result is downloaded file: OpenData_Slovakia_CovidAutomat.csv without any encoding error.
Btw, I also created Java wrapper around automat.gov.sk.