edx-scrape
edx-scrape copied to clipboard
Scrape the html from edX
edx-scrape
Scrape the html from edX
What is this
edx-dl is a wonderful tool made to pull down all the videos/pdfs from an edX course. Unforunately, it is not currently setup to download any HTML content see #600.
Until edx-dl is able to download HTML itself I've made a little hacky script to download it for you.
If you want to download PDFs of all the Q/A checkout out edx-archive.
Usage
- Copy the code from
index.js
- Open your browser and go to the "Progress" page of the course.
- Example: https://courses.edx.org/courses/course-v1:MITx+6.431x+3T2018/progress
- Open the console in devtools | instructions
- Paste the code from
index.js
and press enter. - The script will automatically download the html of all pages listed under progress and output a zip.
- Unzip the file and then open the "pages" folder in your browser
Limits
Currently this script only download the raw HTML it does not grab:
- Images
- Results of the "Show Answers" button
The downloaded pages rely on a few js scripts to be cached in browser if you would like to view the pages offline.
Navigating between pages won't work.
Contributing
I'd much rather you contribute to the edx-dl
project. See #600. But if you'd like to improve this open a ticket and we can chat