Discounted-Udemy-Course-Enroller
Discounted-Udemy-Course-Enroller copied to clipboard
Fix Various Issues and Improve Scrapper Script
Pull Request Summary
New Updates (Commits on Jun 15, 2024):
Error Handling Improvements
-
Improved Error Handling: Added comprehensive error handling in the
enrol
function to ensure the script continues processing even if an error occurs during the execution. Each critical operation is now wrapped in a try-except block, allowing the script to skip problematic iterations and proceed with the next.
Known Issues
-
Language Exclusion: The script still subscribes to free courses in languages that are set to be excluded in the settings. For example, courses in Arabic are being subscribed to even though Arabic is set to
false
in theduce-cli-settings.json
. This issue will be addressed in future updates.
Previous Changes (Commits on Jun 15, 2024):
Improve Link Extraction and Error Handling in Scrapers
-
Link Extraction:
- Improved logic to handle special cases for
click.linksynergy.com
URLs, ensuring all valid links are captured by checking bothmurl=
andRD_PARM1
parameters.
- Improved logic to handle special cases for
-
Nonce Extraction and Processing:
- Corrected the extraction of the JSON string containing the nonce from the script tag in the
cv
function, and updated the processing to properly isolate and parse the JSON data, ensuring successful AJAX requests for fetching course data.
- Corrected the extraction of the JSON string containing the nonce from the script tag in the
-
Error Handling:
- Enhanced error handling across all functions to ensure the script continues processing remaining items if an error occurs, with retries for network requests and handling cases where required elements might not be found.
-
Progress Tracking:
- Refined progress tracking within each scraper function to provide accurate updates on the scraping process.
-
Threading:
- Utilized threading to parallelize scraping tasks, ensuring efficient processing of multiple sites.
-
Data Aggregation:
- Improved the aggregation of scraped data into a unified list, maintaining consistency in the format and structure of the results.
These changes collectively improve the overall reliability, efficiency, and functionality of the script.
Previous Changes (Commits on Jun 10, 2024):
-
Fixed Cloudscraper Session Error:
- Resolved the issue where creating a scraper session through
s
was causing an error withcloudscraper.create_scraper(sess=s)
.
- Resolved the issue where creating a scraper session through
-
Repaired Scrapers for Multiple Sites:
- Fixed the scrapers for Disudemy, Coursevania, and iDownloadCoupon. Previously, the script was unable to start due to issues with these scrapers.
-
Corrected e-Next API Link:
- Updated the script with the correct link for the e-next API, ensuring proper API interaction.
The script is now functioning correctly.
However, there are a couple of minor issues that need attention:
-
The script sometimes keeps retrying. I have attached a screenshot of this behavior for reference.
-
The
tqdm
progress bar is slightly glitchy, repeating the name of the website. Despite this, the backend operations work perfectly, so this issue is only with the display of the progress bar.
Note: I have only tested these changes with the CLI version and have not verified them with the GUI version.
I have no idea about GUI. I didn't try to run it once cause I only prefer the CLI version more often, and for PySimpleGUI, you can register yourself as a "Hobbyist" and get the developer key and can use it for a year... check it out here
These are the versions of my libraries from the requirements:
bs4 0.0.2
cloudscraper 1.2.71
colorama 0.4.6
html5lib 1.1
requests 2.31.0
requests-file 2.0.0
requests-toolbelt 1.0.0
tqdm 4.66.4
You can check and verify your versions with the above by running this command in PowerShell:
pip list | findstr /R "bs4 requests html5lib cloudscraper pyopenssl browser_cookie3 colorama tqdm"
Please review these changes and let me know if any further modifications are needed.