Scraping-Scripts
Scraping-Scripts copied to clipboard
All of my scraping scripts
Scraping-Scripts
Disclaimer
These scripts are provided as-is. I assume no liability of any damage made by usage of these scripts.
Usage
You need the functions.py file for almost every script
pantyhoseplaza.com.py
Edit the script to define your download directory The script needs three parameters passed:
- The page number of the Movie Episodes category
- The username to use to log in
- The password to use to log in
The script downloads JSON-ized data, the thumbnail and videos in all qualities. If you have any suggestions submit a pull request!
IF ANY ERRORS POPUP RERUN THE SCRIPT, IF THEY KEEP OCURRING SUBMIT AN ISSUE Thx :)
wotd.dictionary.com.py
Requested by /u/nsq1
Three different usages:
- All until today: python wotd.dictionary.com.py "/mnt/what/ever/directory/"
- Specific date: python wotd.dictionary.com.py "/mnt/what/ever/directory/" yyyy/mm/dd -single
- Date range: python wotd.dictionary.com.py "/mnt/what/ever/directory/" yyyy/mm/dd yyyy/mm/dd
There is currently one problem with the script. For any dates below 2014/03/01 it throws a 403. If anyone finds a way to fix it submit a pull request!
thesandbornmaps.cudl.colorado.edu.py
Requested by /u/WhiskeyQuebec
Options and arguments:
- -s, --simple Constructs the simple/flat directory structure
- -h, --help Shows this text
- --from= Start at the given document number
- --to= End with the given document number
- --save-dir= Store at this location
wall.alphacoders.com.py
Requested by myself :P
Options and arguments:
- -h, --help Shows this printout
- --update Stops at first found already downloaded
- --save-dir= Store at this location
thechive.com.py
Requested by /u/Broadsid3
No options or arguments. Just run the script. I'll add them later when I have more time. All posts with no valid date format will be stored to NonParsable folder.
Donate
If you like my work and want to donate here's the button! :) Actually there is no button. I have a personal PayPal account and can't set it up. Here's the email instead: [email protected]