TumblThree icon indicating copy to clipboard operation
TumblThree copied to clipboard

Twitter Rate Limit is still an issue

Open T-prog3 opened this issue 2 years ago • 16 comments

Describe the bug Still being Rate Limited on the twitter API with a suggestion to lower the connections in Setting. This however makes no difference at all. I tried as low as 10 Numbers of connections in 60s with only 1 Concurrent connection. To my understanding of the Twitter API https://developer.twitter.com/en/docs/twitter-api/rate-limits this shouldn't be an issue?

This also raises the questions, if the settings only effect the Tumblr API? Should both Tumblr and Twitter really be treated under the same settings and name?

And shouldn't there also be a way to Authenticate a Twitter account? This would allow you to crawl users that only allow followers.

Desktop (please complete the following information):

  • TumblThree version: v2.5.0

T-prog3 avatar Feb 05 '22 13:02 T-prog3

Today and right at the moment it's working (here). Did you already download a bit when you got the error? Then it would be a real "limit exceeded". Or your current IP may be blocked for some reason. Or they are rolling out a new change which you already see and other regions will see it soon.

The settings affect the crawlers for both. Also the default settings are in absolute terms a bit too high for Twitter's API limits, but they work for normal crawling/downloading because of time spent between requests. Obviously there are no separate settings yet, maybe needed in the future.

There is room for improvements. Contributions are welcome.

thomas694 avatar Feb 05 '22 16:02 thomas694

I actually have been trying to update one user who have already been downloaded once two weeks ago. The error is happening the first minute of running during Evaluated N tumblr posts out of N total posts. It doesn't download anything new and then i get Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.

Then i get the message waiting until date/time but at that time it only push the date/time forward and doesn't make any progress even after 1 hour. So it appears to be no way to make a complete update of already downloaded users (as of today). My Last Complete Crawl will continue be stuck at 2022-01-20.

T-prog3 avatar Feb 05 '22 18:02 T-prog3

Please open this blog in the browser and tell me when the first two posts have been posted. Do you have "force rescan" enabled in this blog's settings? What is the value of LastId in this blog's index file?

thomas694 avatar Feb 05 '22 19:02 thomas694

  1. The two latest Tweets both on Feb 3. There are around 60 Tweets since the Last Completed Crawl and the user have a total of 8,796 Tweets.
  2. force rescan is not enabled. However i still think that the software acts in such a way as if this setting was enabled. It's always Evaluated 3500 tumblr posts out of 8,796 total posts when ``Limit exceeded`.
  3. 1483991637554614277

T-prog3 avatar Feb 05 '22 21:02 T-prog3

At the moment I don't have a clue why it's crawling that much on this blog. Do you have a value inside blog's "download pages" setting?

thomas694 avatar Feb 06 '22 13:02 thomas694

No, i have almost everything on default settings. The only things i have changed in the software is

General: Active portable mode Enabled

Connection: Concurrent connections 1 Concurrent video connections 1

Limit Tumblr API connections: Number of connections 30 Limit Tumblr SVC connections: Number of connections 30

Blog: Download reblogged posts Disabled Image size (category) Large Video size (category) Large

T-prog3 avatar Feb 06 '22 17:02 T-prog3

It seems some error occurs during the crawl process that keeps it from updating LastId to the newest post. You could have a look into the TumblThree.log file, whether you see a hint/error there.

thomas694 avatar Feb 06 '22 19:02 thomas694

This is the error in TumblThree.log

You should lower the connections to the tumblr api in the Settings->Connection pane., System.Net.WebException: The remote server returned an error: (429) Too Many Requests. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory1.FromAsyncCoreLogic(IAsyncResult iar, Func2 endFunction, Action1 endAction, Task1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Extensions.TaskTimeoutExtension.<TimeoutAfter>d__0`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Services.WebRequestFactory.<ReadRequestToEndAsync>d__12.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Services\WebRequestFactory.cs:line 129 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<RequestApiDataAsync>d__25.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 257 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<GetRequestAsync>d__24.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 236 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<GetApiPageAsync>d__28.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 339 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler.<GetUserTweetsAsync>d__30.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 364 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler.<CrawlPageAsync>d__33.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 456

T-prog3 avatar Feb 06 '22 22:02 T-prog3

This blog downloads without problems here. Even if I try to emulate your situation by adapting the settings and blog file accordingly, it downloads the posts until the one from last time and stops. I don't know what could be the difference to your system.

You could backup the blog's download folder and its two blog files. Then you can add the blog again and see, whether the blog works again and download the missing new posts. Later you can close the app and merge in the backed up files and the already downloaded entries in "blog"_files.twitter from the copy to the current one (just all entries, a few duplicates are ok).

thomas694 avatar Feb 08 '22 20:02 thomas694

Report from start to end:

  1. I just downloaded the latest TumblThree-v2.5.1-x64-Application.zip
  2. Unzipped and opened TumblThree.exe
  3. Without changing any default settings at all i added some random users with large amount of tweets (5000+)
  4. Enqueued all added users and pressed Crawl
  5. It started to download files from the first user
  6. Got 4151 files (3944 video/images + texts.txt)
  7. Then the error occurred. Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
  8. Apparently this twitter user had 50864 posts so nowhere near completion and still 3 other users to go.
  9. Waited until waiting until date/time
  10. Got a new waiting until date/time
  11. I pressed Stop
  12. Got a new status saying Calculating unique downloads, removing duplicates ...
  13. This took forever and 20 minutes later i terminated the software.
  14. Started the software again
  15. Enqueued all users again and pressed Crawl
  16. It started with the same first user again But this time showed something about File Already downloaded.... Skipping
  17. It got to the point where it started to download some new files
  18. Now i have 4174 files downloaded (3964 video/images + texts.txt)
  19. After these 23!!!! (20 video/images) new files was downloaded the error occurred again.
  20. Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
  21. Terminated the software

Conclusions:

  1. The twitter part of the software works to a certain limit. But will take forever to get any files beyond the limit. With only 20 new files the second time around it will take days to complete the first user if it ever succeed to the finish line.

  2. All skipped files seems to be counted as a request that adds to the limit counter.

Log: No TumblThree.log to be found in the TumblThree-v2.5.1-x64-Application folder.

T-prog3 avatar Feb 08 '22 23:02 T-prog3

Ok, but now we are talking about a different thing, isn't it? It's no longer about downloading a few dozen recent posts, but downloading historic posts (resp. complete blogs). Twitter doesn't want more posts than a certain limit to be downloaded. Obviously they changed something. We have to see, whether we can find a solution or not.

The download of the "post lists" counts towards the limit, whether a post's media is downloaded or skipped.

thomas694 avatar Feb 09 '22 12:02 thomas694

To my understanding:

I see no difference between updating an already downloaded blog and complete new download. Both have the same amount of Number of posts in the active users download queue.

In other words, you will never be able to update/download the second blog in the download queue if the first user have a large amount of Number of posts. The problem is that the software does a request to each and every post the user have no matter if you do an update or download a new user. So you do not only get the recent 100 posts you haven't downloaded yet. You get the full blog in the queue no matter what.

The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl in the queue.

  • But as of now when you get all the users Number of posts in the queue. We have the problem where the user will never complete and because of that a new Date in Last Complete Crawl will never be updated.
  • So we can't be sure if we updated a blog or not.
  • The download process doesn't continue after the limits waiting time is ended.

Problem summary:

  • Updating all your twitter blogs is not longer possible.
  • You can't download a complete blog if it's larger than twitter limit. Because it doesn't continue after waiting time is over.
  • You can't get a new Last Complete Crawl Date if it never completes, And you then wouldn't see if its updated.
  • Updating a user acts in the same way as downloading a new. Same amount of posts in the user queue makes them break at almost the same point.
  • If a user have 50k posts and you downloaded 10k before the new twitter changes. The best you get is about 5k more files. 35k files in the middle is untouchable.
  • Updating more than one user is impossible if they each have over 5k posts (This includes all text tweets).
  • Updating small blogs succeeds as long as they individually or together don't reach x amount of posts, But they will fail to complete the day/time when they do.

T-prog3 avatar Feb 09 '22 15:02 T-prog3

First, you experience resp. describe something that I don't see here. Looks like most other users can update their existing blogs too.

The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl in the queue.

That's exactly what we're doing, precisely LastId (after a successful complete crawl).

In other words, you will never be able to update/download the second blog in the download queue

Not automatically and unattended, yes. You can, for example, remove this blog from the download queue, which stops its crawler and continues with the next one.

Let me summarize what I get (and probably others too):

  • Small blogs can be downloaded and updated without problems.
  • Any reasonably up-to-date blog can be updated without problems.
  • Only big blogs can no longer be downloaded completely and thus updated later. Experienced users could at least update them with a little tweaking (LastId).

The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.

Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.

If you know how to fix it, you are welcome to do so (or share it).

@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?

thomas694 avatar Feb 09 '22 21:02 thomas694

@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?

I've never used Twitter with this App before, so my own experience here is a little limited.

That said, what you state here is obviously true:

  • Small blogs can be downloaded and updated without problems.
  • Any reasonably up-to-date blog can be updated without problems.
  • Only big blogs can no longer be downloaded completely and thus updated later. Experienced users could at least update them with a little tweaking (LastId).

The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.

Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.

The third point is the real issue, as I understand it, and yes, this is a limitation due to how Twitter works.

Hrxn avatar Feb 10 '22 09:02 Hrxn

I can't download any blogs, new or old, few posts or large.

desbest avatar Feb 14 '22 10:02 desbest

I had this problem several months ago but it's not bothered me since and I didn't change anything other than the routine TumbleThree updates. I catch-up with all my Tumblr blogs once a month and add any newly discovered ones. I'm now following 257 Tumblr blogs (In know, I'm hooked!), and the last catch-up on the first of the month was 147 GB and 404,000 files. It took almost 24 hours to harvest everything, but ran perfectly.
I'm using all default settings.

Someonemustnothavethis avatar Apr 08 '22 11:04 Someonemustnothavethis