twitter-archive-parser icon indicating copy to clipboard operation
twitter-archive-parser copied to clipboard

added randomizer to the request delay

Open MarquisTheCoder opened this issue 3 years ago • 4 comments
trafficstars

Hello! I just added a simple randomization of the delay of request to minimize the possibility of cutoffs even more

MarquisTheCoder avatar Nov 14 '22 18:11 MarquisTheCoder

Hello. Thanks for making a PR.

One change per PR please.

What's the motivation for the randomizing? Have we seen any problems with the fixed sleep?

timhutton avatar Nov 14 '22 19:11 timhutton

For good contributions to a project I suggest:

  • Make an issue first, identifying a problem or suggesting a change.
  • Get consensus from the community that it's a valid issue and worth working on.
  • In that issue, propose a solution, invite comments on how it would work.
  • If there's broad agreement on the shape of the solution then go ahead and make a PR. If in doubt just ask.

timhutton avatar Nov 14 '22 20:11 timhutton

Thank you for the help I'm learning to take this into account in the future

MarquisTheCoder avatar Nov 15 '22 17:11 MarquisTheCoder

And as far as having a fixed delay on the request Some websites can block access to prevent web scraping, that can be easily detected if your Python script is sending multiple requests in a short period of time via the same duration. To not get banned adding random delays between queries increases the chance your hitting the sweet spot of the time delay the server will not block in at least some of your requests. I didn't make the delay too much larger than the original but it certainly doesn't hurt!

MarquisTheCoder avatar Nov 15 '22 17:11 MarquisTheCoder

@MarquisTheCoder I'm closing this because the sleep strategy in download_better_images.py has changed completely. The other refactorings might be worth making a new PR for if you are motivated.

timhutton avatar Nov 17 '22 16:11 timhutton