rtweet icon indicating copy to clipboard operation
rtweet copied to clipboard

Error with "arguments imply differing number of rows" when using network_data() after get_timeline()

Open d-schafer opened this issue 1 year ago • 1 comments

Problem

I'm using get_timeline() and then network_data() in a for loop to request user data and then pull out the network data. The code looks like this (in the example, I include a user_id that produces the error):

for (i in list_userids) { 
     twt <- rtweet::get_timeline(
      user = 1016593091480948736,
      n = 500,
      retryonratelimit = TRUE
    )
    #extracting network data
    net <- network_data(twt, e = "all")
    #row binding net data
    net_all <- rbind(net_all, net)
}

The error occurs when running network_data() :

Error in data.frame(from = um$id_str, to = ur$id_str, type = "retweet") : 
  arguments imply differing number of rows: 497, 499, 1

Expected behavior

The for loop works, but the row error occurs in 1 out of every 5 or so loops. The list_userids contains around 800 ids, and the error occurred around 160 times.

The error doesn't occur at the same row number. For example, when running the same get_timeline() and network_data() loop using user id 231751795, the resulting error gives: arguments imply differing number of rows: 334, 335, 1

rtweet version

I am using rtweet version: 1.0.2 R version: 4.2.1 R studio: 2022.07.1 Build 554

Please let me know if you need more information.

d-schafer avatar Aug 14 '22 19:08 d-schafer

Many thanks for the report, it seems that this is about tweets involving deleted accounts. I found the problem comes up with a couple of tweets retweeting a tweet of deleted users. As their content is also deleted the agreement police of the API says that you should:

B. Removals. If Twitter Content is deleted, gains protected status, or is otherwise suspended, withheld, modified, or removed from the Twitter Applications (including removal of location information), you will make all reasonable efforts to delete or modify such Twitter Content (as applicable) as soon as possible, and in any case within 24 hours after a written request to do so by Twitter or by a Twitter user with regard to their Twitter Content, unless prohibited by applicable law or regulation and with the express written permission of Twitter.

... If you store Twitter Content offline, you must keep it up to date with the current state of that content on Twitter. Specifically, you must delete or modify any content you have if it is deleted or modified on Twitter. This must be done as soon as reasonably possible, or within 24 hours after receiving a request to do so by Twitter or the applicable Twitter account owner, or as otherwise required by your agreement with Twitter or applicable law. This must be done unless otherwise prohibited by law, and only then with the express written permission of Twitter.

Modified content can take various forms. This includes (but is not limited to):

  • Content that has been made private or gained protected status

  • Content that has been suspended from the platform

  • Content that has had geotags removed from it

  • Content that has been withheld or removed from Twitter

As such I will omit them from the results from now on, but I might try to keep count of how many different relations are involved but not which users.

I'm fixing this for network data with retweets, but there might be other tweets of deleted accounts that might affect other relations (mentions, replies, quotes). If you found more users or tweets where there is a problem please let me know.

llrs avatar Aug 15 '22 16:08 llrs