Translation 1 or 2 character Chinese words
Thank you for making this amazing tool!!
I have an easy issue. I use textblob to translate Chinese. In Translate.py in def detect with comment """Detect the source text's language.""" requires a minimum length of 3 otherwise it will through an exception. This is not so handy for Chinese characters. For example: 好 means "it is good".
Maybe there is away to change this without losing the effectiveness of detecting the correct language. If you can detect if they are Chinese characters you could drop the minimum length requirement.
Thanks and keep up the good work :)
Interesting - the Google Translate API returns 500 Internal Server Error on strings less than 3 characters long, which is presumably the reason for this limitation in the code:
>>> from urllib2 import urlopen, Request
>>> headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_6_8) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.168 Safari/535.19'}
>>> url='http://translate.google.com/translate_a/t'
>>> data=urlencode({'text': 'f', 'oe': 'UTF-8', 'client': 'p', 'ie': 'UTF-8'}).encode('utf-8')
>>> data
'text=f&oe=UTF-8&client=p&ie=UTF-8'
>>> urlopen(Request(url=url, headers=headers, data=data))
urllib2.HTTPError: HTTP Error 500: Internal Server Error
However, your example works fine:
>>> data=urlencode({'text': '好', 'oe': 'UTF-8', 'client': 'p', 'ie': 'UTF-8'}).encode('utf-8')
>>> data
'text=%E5%A5%BD&oe=UTF-8&client=p&ie=UTF-8'
>>> urlopen(Request(url=url, headers=headers, data=data))
<addinfourl at 140466353800976 whose fp = <socket._fileobject object at 0x7fc0e15d6850>>
We can probably just remove the restriction, and raise if we get a 500 Error from the API. I'll test this a bit more, and make the change if possible.
Currently I did the same to bypass the issue. Maybe the google api decodes the chinese character which results in a three character long word..
But super you are looking into it. Thanks