Adam Obeng comments

Results 8 comments of


                                            Adam Obeng

Visualization in tutorial not working in colab and remote setup

Thanks for flagging this, @SebastianRiechert! I've asked @2timesjay, who's most familiar with this part of the code, to look into it.

readtext doesn't perform Unicode normalization

Implemented in [PR #52](https://github.com/kbenoit/readtext/pull/52)

Encoding handling not handled by stringi and possibly inconsistent

By my reckoning, - get_txt, get_csv, get_json_tweets, get_json_lines use readLines. - get_json_object uses jsonlite::fromJSON - get_XML uses XML::xmlTreeParse or XML::xmlToDataFrame - get_html, get_docx use XML::htmlTreeParse - get_pdf uses the pdf2text...

Encoding handling not handled by stringi and possibly inconsistent

I should also note that we don't currently "include functions for diagnosing encodings on a file-by-file basis", because the stringi encoding detection stuff is not currently exposed.

Add tests for encoding()

AFAICT, `encoding2()` detects 'ISO-8859-1' for all of the differently-encoded test files we have: https://travis-ci.org/kbenoit/readtext/builds/172649316

Add tests for encoding()

`stri_enc_detect` doesn't seem to work at all on any of the encodings we have example files for. It could be because our files include the whole charset rather than actual...

Add tests for encoding()

On Kohei's suggestion, check out https://github.com/haven-jeon/Ruchardet/

Add bracketed paste option (fixes #65)

Thanks @christoomey! I understand the desire to keep things simple. Are you suggesting that users would define functions to send the bracketed paste characters in their own configuration files? That...