snacktory
snacktory copied to clipboard
ensure asian characters are not broken
This is now fixed! But needs a unit test!
From email:
The issue is in Converter.streamToString(). There's a loop to read http data chunks. Each chunk is converted separately to String, but may contain only the first (or seconf) half of a character, thus result in corrupted data. It happens sporadically depending on timing.
Also, the counting of bytesRead was wrong, so for slow connection there may be a "size exceeded" message with no justification.
What I did to test this problem is reading a Japanese article (url below) with the Browser, save its content somewhere (e.g. on file). Then run the streamToString() function in a loop (with some delay) and each time compare its output with the expected output on file. Sometimes I experienced dozens successful tests and then several failures, so this is not too persistent but the errors were often enough.
The article I tested on is http://astand.asahi.com/magazine/wrscience/2012022900015.html, and the corruption was almost always visible in the string "300" (see in the article), where instead of the "3" some junk was displayed.
see https://github.com/karussell/snacktory/commit/09c48a362c3652c2296e252b4cda42f13ed4aad7