snacktory icon indicating copy to clipboard operation
snacktory copied to clipboard

ensure asian characters are not broken

Open karussell opened this issue 12 years ago • 1 comments

This is now fixed! But needs a unit test!

From email:

The issue is in Converter.streamToString(). There's a loop to read http data chunks. Each chunk is converted separately to String, but may contain only the first (or seconf) half of a character, thus result in corrupted data. It happens sporadically depending on timing.

Also, the counting of bytesRead was wrong, so for slow connection there may be a "size exceeded" message with no justification.

What I did to test this problem is reading a Japanese article (url below) with the Browser, save its content somewhere (e.g. on file). Then run the streamToString() function in a loop (with some delay) and each time compare its output with the expected output on file. Sometimes I experienced dozens successful tests and then several failures, so this is not too persistent but the errors were often enough.

The article I tested on is http://astand.asahi.com/magazine/wrscience/2012022900015.html, and the corruption was almost always visible in the string "300" (see in the article), where instead of the "3" some junk was displayed.

karussell avatar Mar 28 '12 08:03 karussell

see https://github.com/karussell/snacktory/commit/09c48a362c3652c2296e252b4cda42f13ed4aad7

karussell avatar Mar 28 '12 08:03 karussell