JobFunnel icon indicating copy to clipboard operation
JobFunnel copied to clipboard

GlassDoor support (fix and re-enable)

Open PaulMcInnis opened this issue 3 years ago • 6 comments

Issue

Description

Currently we get the second page of glassdoor via the URL of the 2 button, but this no longer works as it redirects you to the first page. This is the case wether we use the webdriver or not.

Steps to Reproduce

  1. navigate to https://www.glassdoor.ca/Job/waterloo-python-jobs-SRCH_IL.0,8_IC2280158_KO9,15.htm?radius=12&p=2

Expected behavior

We get to the second page of jobs

Actual behavior

We are redirected to the first page during the GET, which leads to every single page of jobs being a duplicate of the first page, with loads of TFIDF duplicate detection hits.

If you click the 2 button yourself, you will get toast RE: subscribing to email notifications, which will then navigate you to the second page.

Environment

  • Build: current development, or the branch on #85
  • Operating system and version: Ubntu 20.04
  • [Linux] Desktop Environment and/or Window Manager: Chrome

PaulMcInnis avatar Aug 21 '20 01:08 PaulMcInnis

If anyone has knowledge of react or javascript, I would super appreciate the help!

PaulMcInnis avatar Aug 24 '20 22:08 PaulMcInnis

I made an isolated script that does just that. You can find it here: https://github.com/Zenahr/selenium-glassdoor-page-jumper Feel free to use it. (I made it specifically for JobFunnel and this issue but I haven't made a PR yet since I ran into a lot of import issues)

Zenahr avatar Aug 28 '20 17:08 Zenahr

This is great! Thanks to your effort I can add back the Glassdoor scraper.

I will refer the driver logic to this issue and to your user name if you like. Otherwise you are welcome to contribute on the eventual PR.

PaulMcInnis avatar Aug 28 '20 18:08 PaulMcInnis

Leaving a note to myself here that we can use TravisCI to run seleneium if we follow steps here: https://docs.travis-ci.com/user/gui-and-headless-browsers/

They have an API for this.

PaulMcInnis avatar Sep 13 '20 01:09 PaulMcInnis

Unassigning myself for now just because I want to see about if there is a way to avoid use of a web-driver. I dislike the latency introduced by this approach, but I do recognise that there may not be another way.

In the near future I'm going to focus more on squashing bugs with the current engine around status updates and duplicates.

PaulMcInnis avatar Sep 30 '20 12:09 PaulMcInnis

@PaulMcInnis is this issue needing help that is on the backend integrating the script that @Zenahr created, sidestepping the glass door captcha? I'm only recently finding this repo and still learning to use it, but can help contribute.

datatalking avatar Mar 13 '23 01:03 datatalking