whereis-whoishiring-hiring
whereis-whoishiring-hiring copied to clipboard
Update scraping to support HN's layout changes
Three parts to this PR:
- I froze the requirements versions because I wasn't able to run the project otherwise. I'm sure some packages could be updated, but I didn't look into it.
- Scraping changes due to HN updates
- The comment span is now a div
- The post selector had to be constrained (class
comtr
) because it was picking up table rows that contained pagination links.
- Pagination support
- The scraper now traverses "More" links on a page if there are any and continues scraping until it reaches the end of the comments.
- The new logic is in
extract_jobs_from_thread(s)
Disclaimer: When I tested this, it eventually timed out after successfully processing 7 months of posts. Perhaps it hit a rate limit or bad luck. You might have to load the production database one month at a time to catch up to the present month.
Sidenote: I didn't know this was broken until I recently needed the "Who's hiring?" thread again. Thanks for making it! :slightly_smiling_face:
fixes #4
@cooperra niiiice thanks for doing this! 🙏 - it's good to see this super-old side project getting some love :)
I'll try to merge and get the site back up later this week and I'll let you know.