Adam Obeng
Adam Obeng
Thanks for flagging this, @SebastianRiechert! I've asked @2timesjay, who's most familiar with this part of the code, to look into it.
Implemented in [PR #52](https://github.com/kbenoit/readtext/pull/52)
By my reckoning, - get_txt, get_csv, get_json_tweets, get_json_lines use readLines. - get_json_object uses jsonlite::fromJSON - get_XML uses XML::xmlTreeParse or XML::xmlToDataFrame - get_html, get_docx use XML::htmlTreeParse - get_pdf uses the pdf2text...
I should also note that we don't currently "include functions for diagnosing encodings on a file-by-file basis", because the stringi encoding detection stuff is not currently exposed.
AFAICT, `encoding2()` detects 'ISO-8859-1' for all of the differently-encoded test files we have: https://travis-ci.org/kbenoit/readtext/builds/172649316
`stri_enc_detect` doesn't seem to work at all on any of the encodings we have example files for. It could be because our files include the whole charset rather than actual...
On Kohei's suggestion, check out https://github.com/haven-jeon/Ruchardet/
Thanks @christoomey! I understand the desire to keep things simple. Are you suggesting that users would define functions to send the bracketed paste characters in their own configuration files? That...