pluralsight-scraper
pluralsight-scraper copied to clipboard
Pluralsight block my account
I think you can add delay 1 or 2 min between videos download, no matter how big this delay I think we can live with that.
My account also blocked due to high activity. I just increased the delay to 180(3 minutes) and it works for me.
How to increase the delay? @azhrzafar
@mhelmi in the index.js file look for this line await wait(30000 * index);
increase the 30000 to any duration you like the default there is 30secs as indicated by the comment
Funny thing I increased mine to 3 minutes and I got blocked anyway, although it took longer than before, so now I am blocked for the second time and I do not know what to write to Pluralsight, I doubt they would open it up this time
How to increase the delay?
@azhrzafar
Anyway, they still block you even you change delay to 5 minutes.
The good news is that there's still time to create new accounts. The bad news is that the current naive timeout strategy is definitely not working. I'm curious about how they detect it.
- Do we need a user agent?
- Random timing?
I guess we could try both.
Or a random download sequence.
Also blocked...
Seems they are watching for the presence of additional tracking requests
Yeah I think so, they could be watching for a lot of things, guys there is a way to download courses without stress now just use the pluralsight app downloader then decrypt using this https://github.com/mrvogiacu/Decrypt-PluralSight-Videos-GUI
Try install puppeteer-extra
@siriokun Have you tested how much of an impact the stealth mode does?
I have tested it and seems to working (with stealth mode & increasing delay to 2 minute)
Lovely. Would you like to send a PR with your changes?
Sure #25
Can you take a look at this python script? it works perfectly. https://github.com/rojter-tech/pluradl.py
As for the https://github.com/mrvogiacu/Decrypt-PluralSight-Videos-GUI be careful, since mid april Pluralsight break their own client (on Windows at least). They can't playback using their own client (uh uh...). They seem to use a new encryption class and I guess they forget to switch the playback decryption?
Otherwise just some feedback of someone messing on his side by itself and still has a valid account even if I downloaded a lot of course. (Overall I did download the equivalent of 300 courses metadata/videos (I had to download them multiple them)).
- I always scrapped data using puppeteer by querying what the end-user will navigate to and then intercept the content from XHR query (so I did not explicitly navigate to the viewclip page for example)
- Exception for the player webpage. I never navigate to it since you have the video file URL on the overall course page.
- I also try to download metadata/video from the pluralsight cache (from their official application that use an API).
I even forget to put throttling a couple of time (thought I was processing one HTTP page at anytime, so it act like throttling).
BTW puppeteer-extra seem nice, thanks !
I think Pluralsight might be on to you.
Decrypt is 404...
Pluralsight's legal team is upset with me allowing the posting of the links to the decrypter. They also wanted me to take down this project. So lets not link to it anymore, as it gets me unneeded attention.
I'm sure you'll be able to find it online if you look for it enough.
I may have a solution to avoid detection by Pluralsight: allow the use of already created cookies to connect. If your program works the same way like others do, they detect the fact that it doesn't solve captchas. When they see you trying to connect for the 4th time in a single day, they add a captcha and, if your program can't solve it (it could be done but really time consuming), they assume that you use a bot. They also have a list of disposable email adresses providers to prevent you from using some of them.
To avoid the captcha thing, a VPN service provider would be your first solution but the connection via cookies could definitely help you lay low.
By the way, if you want to run this in background (e.g. when sleeping), you need to increase the delay between the end of download n and the start of download n+1. A randomized delay between 20 and 40 minutes might do the trick.
I don't really know how your program works since I don't use this programs anymore since last month I got everything I wished (including courses that I paid on HB) but some instructors (hey Kate Gregory) are wiling to update their courses so it might come in handy.
I didn't open a new issue to help you lay low.