google-images-download
google-images-download copied to clipboard
UnicodeDecodeError in python2
googleimagesdownload -k 'สวัสดีครับ' -l 5
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.7/bin/googleimagesdownload", line 10, in <module>
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 998, in main
paths,errors = response.download(arguments) #wrapping response in a variable just for consistency
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 825, in download
paths, errors = self.download_executor(arguments)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 922, in download_executor
print(iteration.encode('raw_unicode_escape').decode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 29: ordinal not in range(128)
+1 to this issue. Being able to pull images for languages other than english would be really, really useful.
If there's any work around in the meanwhile, please do share!
Same issue in python3.7.
A workaround is to replace non-ascii characters by their unicode representation.
Python 3:
def transliterate(string):
"""Transliterates string into his closest representation.
Ex: 1. àé => ae,
2. สวัสดีครับ => swasdiikhrab.
:param string: string
:return: closest string.
"""
from unidecode import unidecode
if not isinstance(string, bytes):
string = u''.join(string)
return unidecode(string)
Python 2:
Replace if not isinstance(string, bytes):
by if not isinstance(string, unicode):
@jose-t : Thank you! I was trying to solve the issue and found Translation (vs. Transliteration) to work the better in some cases, in Google Image search results. Depending on the query, either might give better results, but it's hard to predict before comparing the results for n queries.
Posting translate function in case someone wants to use it (note: you'll need Google Cloud account and follow authentication steps: https://cloud.google.com/translate/docs/reference/libraries)
def translate_text_to_en(query):
translate_client = translate.Client()
translation = translate_client.translate(query.encode('utf8'), target_language='en')
print(u'Translation: {}'.format(translation['translatedText']))
return translation['translatedText']
It's a bug while trying to print the exactly name in the utf-8 encoding. if you were using python3, you can just git clone the project and remove the decode in google_images_download/google_images_download.py
like this:
print(iteration.encode('raw_unicode_escape').decode('utf-8'))
into
print(iteration.encode('raw_unicode_escape'))
then uninstall the origin one and python setup.py install
the modified one. Althought it print bad on screen but it works fine.
Using python2 will have another problem....
ps. still dont know why the example in website using Chinese that pass but another language decode have problem ...
Have you ever tried export PYTHONIOENCODING=utf-8
?
@DuckSoft That doesn't fix it for me.