Substack2Markdown
Substack2Markdown copied to clipboard
Fix Error Too Many Requests
This commit introduces improvements to the get_url_soup function to handle rate-limiting errors (HTTP 429) more effectively. Below are the details of the changes:
Changes Made:
-
Added
max_attemptsParameter:- Introduced a new parameter max_attempts with a default value of 5, allowing multiple retry attempts when temporary errors occur.
-
Handling "Too Many Requests" (HTTP 429):
- Implemented an exponential backoff retry mechanism in case the page content indicates "too many requests."
- Added random jitter to the delay between retries to reduce the likelihood of triggering server rate limits.
-
Improved Error Messages:
- Enhanced error messages to include the URL and number of failed attempts for better debugging.
-
Additional Logic:
- Checks if the page contains a
<pre>element with the text "too many requests" and retries after a delay if detected.
- Checks if the page contains a