readtext
readtext copied to clipboard
Add tests for encoding()
AFAICT, encoding2() detects 'ISO-8859-1' for all of the differently-encoded test files we have: https://travis-ci.org/kbenoit/readtext/builds/172649316
Note: It's only called encoding2 to prevent NAMESPACE conflicts with quanteda. Let's drop the "2" once we remove the original function from quanteda.
stri_enc_detect doesn't seem to work at all on any of the encodings we have example files for. It could be because our files include the whole charset rather than actual usage examples, but still.
On Kohei's suggestion, check out https://github.com/haven-jeon/Ruchardet/