google-images-download icon indicating copy to clipboard operation
google-images-download copied to clipboard

UnicodeDecodeError in python2

Open hardikvasa opened this issue 5 years ago • 6 comments

googleimagesdownload -k 'สวัสดีครับ' -l 5

Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/2.7/bin/googleimagesdownload", line 10, in <module>
    sys.exit(main())
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 998, in main
    paths,errors = response.download(arguments)  #wrapping response in a variable just for consistency
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 825, in download
    paths, errors = self.download_executor(arguments)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/google_images_download/google_images_download.py", line 922, in download_executor
    print(iteration.encode('raw_unicode_escape').decode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe0 in position 29: ordinal not in range(128)

hardikvasa avatar May 26 '19 07:05 hardikvasa

+1 to this issue. Being able to pull images for languages other than english would be really, really useful.

If there's any work around in the meanwhile, please do share!

sudcha23 avatar Jul 08 '19 00:07 sudcha23

Same issue in python3.7.

A workaround is to replace non-ascii characters by their unicode representation.

Python 3:

def transliterate(string):
    """Transliterates string into his closest representation.
    Ex: 1. àé => ae,
        2. สวัสดีครับ => swasdiikhrab.
    :param string: string
    :return: closest string.
    """
    from unidecode import unidecode

    if not isinstance(string, bytes):
        string = u''.join(string)

    return unidecode(string)

Python 2:

Replace if not isinstance(string, bytes): by if not isinstance(string, unicode):

jose-t avatar Jul 15 '19 17:07 jose-t

@jose-t : Thank you! I was trying to solve the issue and found Translation (vs. Transliteration) to work the better in some cases, in Google Image search results. Depending on the query, either might give better results, but it's hard to predict before comparing the results for n queries.

Posting translate function in case someone wants to use it (note: you'll need Google Cloud account and follow authentication steps: https://cloud.google.com/translate/docs/reference/libraries)

def translate_text_to_en(query):
	translate_client = translate.Client()
	translation = translate_client.translate(query.encode('utf8'), target_language='en')
	print(u'Translation: {}'.format(translation['translatedText']))
	return translation['translatedText']

sudcha23 avatar Jul 15 '19 18:07 sudcha23

It's a bug while trying to print the exactly name in the utf-8 encoding. if you were using python3, you can just git clone the project and remove the decode in google_images_download/google_images_download.py like this: print(iteration.encode('raw_unicode_escape').decode('utf-8'))
into print(iteration.encode('raw_unicode_escape')) then uninstall the origin one and python setup.py install the modified one. Althought it print bad on screen but it works fine. Using python2 will have another problem.... ps. still dont know why the example in website using Chinese that pass but another language decode have problem ...

chzhc avatar Jul 23 '19 11:07 chzhc

Have you ever tried export PYTHONIOENCODING=utf-8?

DuckSoft avatar Aug 07 '19 18:08 DuckSoft

@DuckSoft That doesn't fix it for me.

Geremia avatar Jul 01 '20 22:07 Geremia