materials icon indicating copy to clipboard operation
materials copied to clipboard

Rework notebooks to use the static self-hosted fake job board

Open martin-martin opened this issue 2 years ago • 0 comments

indeed.com has tightened their bot protection against web scraping, which is why requests to their site as they are described in this course return 403 Forbidden status codes.

I've attempted to circumvent this using fake headers (something that would be explainable in an intro course) but no luck, 403 prevails.

I've previously reworked the written tutorial to use a self-hosted fake job board that I set up just for the purpose of the tutorial.

As a quick fix for the video course, I added an explanatory lesson to the video coure and reworked the Jupyter notebooks.

The information and processes that I explain in the rest of the course are still valid and a good introduction for how to approach scraping a static website.

Where to put new files:

  • New files should go into a top-level subfolder, named after the article slug. For example: my-awesome-article

How to merge your changes:

  1. Make sure the CI code style tests all pass (+ run the automatic code formatter if necessary).
  2. Find an RP Team member on Slack and ask them to review & approve your PR.
  3. Once the PR has one positive ("approved") review, GitHub lets you merge the PR.
  4. 🎉

martin-martin avatar Dec 21 '22 13:12 martin-martin