TumblThree
TumblThree copied to clipboard
Twitter Rate Limit is still an issue
Describe the bug
Still being Rate Limited
on the twitter API with a suggestion to lower the connections in Setting. This however makes no difference at all. I tried as low as 10 Numbers of connections in 60s
with only 1 Concurrent connection
. To my understanding of the Twitter API https://developer.twitter.com/en/docs/twitter-api/rate-limits
this shouldn't be an issue?
This also raises the questions, if the settings only effect the Tumblr API? Should both Tumblr and Twitter really be treated under the same settings and name?
And shouldn't there also be a way to Authenticate a Twitter account? This would allow you to crawl users that only allow followers.
Desktop (please complete the following information):
- TumblThree version: v2.5.0
Today and right at the moment it's working (here). Did you already download a bit when you got the error? Then it would be a real "limit exceeded". Or your current IP may be blocked for some reason. Or they are rolling out a new change which you already see and other regions will see it soon.
The settings affect the crawlers for both. Also the default settings are in absolute terms a bit too high for Twitter's API limits, but they work for normal crawling/downloading because of time spent between requests. Obviously there are no separate settings yet, maybe needed in the future.
There is room for improvements. Contributions are welcome.
I actually have been trying to update one user who have already been downloaded once two weeks ago. The error is happening the first minute of running during Evaluated N tumblr posts out of N total posts.
It doesn't download anything new and then i get Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
Then i get the message waiting until date/time
but at that time it only push the date/time forward and doesn't make any progress even after 1 hour. So it appears to be no way to make a complete update of already downloaded users (as of today). My Last Complete Crawl
will continue be stuck at 2022-01-20.
Please open this blog in the browser and tell me when the first two posts have been posted.
Do you have "force rescan" enabled in this blog's settings?
What is the value of LastId
in this blog's index file?
- The two latest Tweets both on Feb 3. There are around 60 Tweets since the
Last Completed Crawl
and the user have a total of 8,796 Tweets. -
force rescan
is not enabled. However i still think that the software acts in such a way as if this setting was enabled. It's alwaysEvaluated 3500 tumblr posts out of 8,796 total posts
when ``Limit exceeded`. - 1483991637554614277
At the moment I don't have a clue why it's crawling that much on this blog. Do you have a value inside blog's "download pages" setting?
No, i have almost everything on default settings. The only things i have changed in the software is
General:
Active portable mode
Enabled
Connection:
Concurrent connections 1
Concurrent video connections 1
Limit Tumblr API connections: Number of connections 30
Limit Tumblr SVC connections: Number of connections 30
Blog:
Download reblogged posts
Disabled
Image size (category)
Large
Video size (category)
Large
It seems some error occurs during the crawl process that keeps it from updating LastId
to the newest post.
You could have a look into the TumblThree.log
file, whether you see a hint/error there.
This is the error in TumblThree.log
You should lower the connections to the tumblr api in the Settings->Connection pane., System.Net.WebException: The remote server returned an error: (429) Too Many Requests. at System.Net.HttpWebRequest.EndGetResponse(IAsyncResult asyncResult) at System.Threading.Tasks.TaskFactory
1.FromAsyncCoreLogic(IAsyncResult iar, Func
2 endFunction, Action1 endAction, Task
1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Extensions.TaskTimeoutExtension.<TimeoutAfter>d__0`1.MoveNext() --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Services.WebRequestFactory.<ReadRequestToEndAsync>d__12.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Services\WebRequestFactory.cs:line 129 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<RequestApiDataAsync>d__25.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 257 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<GetRequestAsync>d__24.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 236 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at TumblThree.Applications.Crawler.TwitterCrawler.<GetApiPageAsync>d__28.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 339 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler.<GetUserTweetsAsync>d__30.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 364 --- End of stack trace from previous location where exception was thrown --- at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at TumblThree.Applications.Crawler.TwitterCrawler.<CrawlPageAsync>d__33.MoveNext() in C:\projects\Tumblthree\src\TumblThree\TumblThree.Applications\Crawler\TwitterCrawler.cs:line 456
This blog downloads without problems here. Even if I try to emulate your situation by adapting the settings and blog file accordingly, it downloads the posts until the one from last time and stops. I don't know what could be the difference to your system.
You could backup the blog's download folder and its two blog files. Then you can add the blog again and see, whether the blog works again and download the missing new posts. Later you can close the app and merge in the backed up files and the already downloaded entries in "blog"_files.twitter from the copy to the current one (just all entries, a few duplicates are ok).
Report from start to end:
- I just downloaded the latest
TumblThree-v2.5.1-x64-Application.zip
- Unzipped and opened
TumblThree.exe
- Without changing any default settings at all i added some random users with large amount of tweets (5000+)
- Enqueued all added users and pressed
Crawl
- It started to download files from the first user
- Got 4151 files
(3944 video/images + texts.txt)
- Then the error occurred.
Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
- Apparently this twitter user had
50864 posts
so nowhere near completion and still 3 other users to go. - Waited until
waiting until date/time
- Got a new
waiting until date/time
- I pressed
Stop
- Got a new status saying
Calculating unique downloads, removing duplicates ...
- This took forever and 20 minutes later i terminated the software.
- Started the software again
- Enqueued all users again and pressed
Crawl
- It started with the same first user again But this time showed something about
File Already downloaded.... Skipping
- It got to the point where it started to download some new files
- Now i have 4174 files downloaded
(3964 video/images + texts.txt)
- After these 23!!!!
(20 video/images)
new files was downloaded the error occurred again. -
Error 1: Limit exceeded: username You should lower the connections to tumblr API in the Settings ->Connection pane.
- Terminated the software
Conclusions:
-
The twitter part of the software works to a certain limit. But will take forever to get any files beyond the limit. With only 20 new files the second time around it will take days to complete the first user if it ever succeed to the finish line.
-
All skipped files seems to be counted as a request that adds to the limit counter.
Log:
No TumblThree.log
to be found in the TumblThree-v2.5.1-x64-Application folder.
Ok, but now we are talking about a different thing, isn't it? It's no longer about downloading a few dozen recent posts, but downloading historic posts (resp. complete blogs). Twitter doesn't want more posts than a certain limit to be downloaded. Obviously they changed something. We have to see, whether we can find a solution or not.
The download of the "post lists" counts towards the limit, whether a post's media is downloaded or skipped.
To my understanding:
I see no difference between updating an already downloaded blog and complete new download.
Both have the same amount of Number of posts
in the active users download queue.
In other words, you will never be able to update/download the second blog in the download queue if the first user have a large amount of Number of posts
. The problem is that the software does a request to each and every post the user have no matter if
you do an update or download a new user. So you do not only get the recent 100 posts you haven't downloaded yet. You get the full blog in the queue no matter what.
The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl
in the queue.
- But as of now when you get all the users
Number of posts
in the queue. We have the problem where the user will never complete and because of that a new Date inLast Complete Crawl
will never be updated. - So we can't be sure if we updated a blog or not.
- The download process doesn't continue after the limits waiting time is ended.
Problem summary:
- Updating all your twitter blogs is not longer possible.
- You can't download a complete blog if it's larger than twitter limit. Because it doesn't continue after waiting time is over.
- You can't get a new
Last Complete Crawl
Date if it never completes, And you then wouldn't see if its updated. - Updating a user acts in the same way as downloading a new. Same amount of posts in the user queue makes them break at almost the same point.
- If a user have 50k posts and you downloaded 10k before the new twitter changes. The best you get is about 5k more files. 35k files in the middle is untouchable.
- Updating more than one user is impossible if they each have over 5k posts (This includes all text tweets).
- Updating small blogs succeeds as long as they individually or together don't reach x amount of posts, But they will fail to complete the day/time when they do.
First, you experience resp. describe something that I don't see here. Looks like most other users can update their existing blogs too.
The problem with updating a blog would not be a problem if you only got the recent posts between Now and Last Complete Crawl in the queue.
That's exactly what we're doing, precisely LastId
(after a successful complete crawl).
In other words, you will never be able to update/download the second blog in the download queue
Not automatically and unattended, yes. You can, for example, remove this blog from the download queue, which stops its crawler and continues with the next one.
Let me summarize what I get (and probably others too):
- Small blogs can be downloaded and updated without problems.
- Any reasonably up-to-date blog can be updated without problems.
- Only big blogs can no longer be downloaded completely and thus updated later. Experienced users could at least update them with a little tweaking (
LastId
).
The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.
Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.
If you know how to fix it, you are welcome to do so (or share it).
@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?
@Hrxn @desbest @cr1zydog I hope you don't mind. Can you still update your existing twitter blogs?
I've never used Twitter with this App before, so my own experience here is a little limited.
That said, what you state here is obviously true:
- Small blogs can be downloaded and updated without problems.
- Any reasonably up-to-date blog can be updated without problems.
- Only big blogs can no longer be downloaded completely and thus updated later. Experienced users could at least update them with a little tweaking (
LastId
).The last point needs to be fixed, so that all posts to the limit are downloaded and then the blog is marked as completely downloaded. This limit exists...[#161] That a workaround will not work forever should be clear and understandable.
Obviously they changed something. We have to see, whether we can find a [workaround] solution or not.
The third point is the real issue, as I understand it, and yes, this is a limitation due to how Twitter works.
I can't download any blogs, new or old, few posts or large.
I had this problem several months ago but it's not bothered me since and I didn't change anything other than the routine TumbleThree updates. I catch-up with all my Tumblr blogs once a month and add any newly discovered ones. I'm now following 257 Tumblr blogs (In know, I'm hooked!), and the last catch-up on the first of the month was 147 GB and 404,000 files. It took almost 24 hours to harvest everything, but ran perfectly.
I'm using all default settings.