rtweet icon indicating copy to clipboard operation
rtweet copied to clipboard

[Feature suggestion] Use twitter-text package to provide tweet-checking functionality

Open wurli opened this issue 1 year ago • 6 comments

twitter-text is a repo made public by Twitter which contains the functionality they use for parsing and checking tweets. This could be easily bundled with {rtweet}, e.g. by using {V8} to wrap the JavaScript package in the repo.

Here are some examples of functions that could be easily created by wrapping this library:

  • tweet_weighted_length(): Get the length of a tweet based on twitter's parsing rules (e.g. 23 chars per link etc)
  • tweet_permillage(): Get the ratio of consumed weighted length to the maximum weighted length
  • tweet_is_valid(): Determine whether text is a valid tweet
  • tweet_extract_mentions(): Get any @ mentions present in a tweet
  • tweet_autolink(): Parse a tweet to HTML with links automatically inserted for any @,#,$ tags

If you think this sounds like a good feature I'm happy to begin a pull request. Otherwise I may create a standalone package. However I think {rtweet} is a better place for this functionality.

Thanks for the hard work on this great package!

wurli avatar Sep 29 '22 15:09 wurli

Hi Jacob, thanks for using the package and the suggestions.

I have some comments about it:

  1. I am not familiar with JavaScript and I could not maintain it: Would you maintain the code up to day and working? Or at least help me to be up to date with JavaScript?
  2. It might be hard to include this external library while keeping the package in CRAN (it might not). But in (software) engineering I prefer to keep to the golden rule of keep it simple. Have you experience with that?
  3. Most of this could be implemented in R or are already implemented: extracting mentions is already done by the API you just need to check the out$entities[[1]]$user_mentions or links and hashtags and other entities like media out$entities[[1]]$media, ... It could be easy to create a link for those to be opened in the browser with R without any dependency.
  4. While I think it is fine to have some functions outside wrapping the API I do not want to expand much that surface as later other packages and the user might depend on it and might become very hard to keep up to date with them.

However, I agree that currently there is no implemented way to check if a tweet is valid. A function that check if a message meets the twitter length rule and the millage could be a nice addition as so far the package focused on extracting data and not much on posting to twitter. Do you know where in the code of twitter-text are this rules and where are they documented?

llrs avatar Sep 29 '22 15:09 llrs

Hi Lluís, thanks for responding!

I am not familiar with JavaScript and I could not maintain it: Would you maintain the code up to day and working? Or at least help me to be up to date with JavaScript?

Certainly! Actually I think this would require a very minimal amount of work, but of course I'd be happy to help if needed. Without going too much into details, wrapping the existing JavaScript functionality is actually rather simple.

It might be hard to include this external library while keeping the package in CRAN (it might not). But in (software) engineering I prefer to keep to the golden rule of keep it simple. Have you experience with that?

I completely agree with you - it's not much use including a nice feature if CRAN remove the package. However I don't think this will pose a challenge either. JavaScript for R gives a nice primer on how {V8} can be used to wrap JavaScript libraries in packages. As far as I can see, the biggest potential issue is licensing. fortunately however, twitter-text uses the Apache 2.0 license which is very permissive, so I don't think this would be an issue either.

Most of this could be implemented in R or are already implemented: extracting mentions is already done by the API you just need to check the out$entities[[1]]$user_mentions or links and hashtags and other entities like media out$entities[[1]]$media, ... It could be easy to create a link for those to be opened in the browser with R without any dependency.

Aha! Yes that's very true. However I do think it's nice to have this functionality available without querying the twitter API - especially if you have a lot of tweets to process. I've been pleasantly surprised by how fast the JavaScript library is - even when called from R.

While I think it is fine to have some functions outside wrapping the API I do not want to expand much that surface as later other packages and the user might depend on it and might become very hard to keep up to date with them.

I don't think this is too much of a risk. The last time the twitter-text JavaScript package was updated was 3 years ago which indicates that it's very stable by now. Also, being able to bundle the code itself means that {rtweet} developers would be fully in control of how potential updates are handled. That said, I agree the concern is valid, and I think a decision to limit the wrapped functionality would be reasonable. I think tweet_is_valid() and tweet_length() would be by far the most useful functions anyway.

A demo

I just mocked up a quick package to demonstrate just how painless it is to wrap the JavaScript library - it took me all of 30 minutes! Examples are given in the repo's README.md. You'll see that JavaScript portion is contained in a single file sitting in inst/extdata - it's only necessary to include a single compiled JavaScript file from the twitter-text repo. The only bit of boilerplate is in zzz.R, where the V8 engine is initialised. By the way, I think {V8} could definitely be added as a suggested package rather than a hard dependency - I'm sure many users wouldn't ever need this functionality.

wurli avatar Sep 29 '22 21:09 wurli

I've seen you use it this to post with a bot content of irregular length. I will try to provide something similar to what you currently have but without any JavaScript dependency. This will clear any potential problem between licenses and make it maintainable by myself.

llrs avatar Sep 30 '22 18:09 llrs

Sounds good. Thanks for entertaining the idea and best of luck with the R implementation!

wurli avatar Sep 30 '22 20:09 wurli

I'll leave the issue open as a reminder. Thanks for the suggestion

llrs avatar Sep 30 '22 23:09 llrs

Quick update for any lurkers. If you do want to use the JavaScript code from twitter-text, you can do so using {tweetcheck}. Installation:

# From GitHub
remotes::install_github("wurli/tweetcheck")

# From CRAN (in a day or two)
install.packages("tweetcheck")

wurli avatar Oct 06 '22 16:10 wurli