readtext icon indicating copy to clipboard operation
readtext copied to clipboard

Add tests for encoding()

Open adamobeng opened this issue 9 years ago • 4 comments

adamobeng avatar Nov 01 '16 13:11 adamobeng

AFAICT, encoding2() detects 'ISO-8859-1' for all of the differently-encoded test files we have: https://travis-ci.org/kbenoit/readtext/builds/172649316

adamobeng avatar Nov 02 '16 15:11 adamobeng

Note: It's only called encoding2 to prevent NAMESPACE conflicts with quanteda. Let's drop the "2" once we remove the original function from quanteda.

kbenoit avatar Nov 02 '16 19:11 kbenoit

stri_enc_detect doesn't seem to work at all on any of the encodings we have example files for. It could be because our files include the whole charset rather than actual usage examples, but still.

adamobeng avatar Nov 03 '16 15:11 adamobeng

On Kohei's suggestion, check out https://github.com/haven-jeon/Ruchardet/

adamobeng avatar Nov 13 '16 15:11 adamobeng