Could the program run multiple queries in parallel?
Hello!!
The program is awesome, congrats!
I am using the program to run very heavy queries that take a long time to be completed, and I was wondering if the program could be used to run queries on parallel. and if so, direction on what needs to be changed.
It depends on your queries.
The algorithm is this: query for results, get the cursor to the next result, iterate: https://github.com/Mottl/GetOldTweets3/blob/34078e10e5d7b053c3495a1517e6b529a914fb37/GetOldTweets3/manager/TweetManager.py#L64-L69
So in a common case you can't get the parallel execution since you don't know the next cursor (min_position).
But you can run queries in parallel if you can split your single query. For example, if you use --since or --until parameters, then simply split a long period of time to smaller chunks and run multiple GetOldTweets3 with different --since and --until params. You can also split your big query by usernames.
It seems that Twitter has some per IP limitations. Keep this in mind.
Ok awesome! My queries are date based, so I will try to run multiple queries at the same time, as you said.
Thanks a lot!
If you split by dates then --since param of the next interval must equal to --until param of the previous interval (do not add 1 day!)
You mean that i will have to send queries like:
GetOldTweets3 --lang en --querysearch "bitcoin" --since 2018-02-18 --until 2018-02-19
GetOldTweets3 --lang en --querysearch "bitcoin" --since 2018-02-19 --until 2018-02-20
?
Yes, exactly.
That's because --until date is NOT INCLUDED in results — this is Twitter behaviour.
Ok perfect !!
Thanks a lot!