ripme
ripme copied to clipboard
500px now rips non-water marked images
The 500px ripper now rips images without a water mark on them closing issue #491. There are still some issues with the ripper (It takes a long while to start ripping and doesn't save the image titles) but those can be fixed later
Test link http://500px.com/david-foto
My log btw: https://pastebin.com/ZpCgqdFC
@metaprime
Looks good overall. Still managed to get an adult content placeholder image
Also it looks like after one rip of the example link I exceeded the rate limit, so I can't test again.
It looks like theres been some changes to the site since I wrote the ripper, I'll get on fixing these
Maybe it's best to avoid using images = doc.select("meta[property=og:image]");
completely, so we don't rely on <meta og:image...
at all.
Then this check can be discarded: if (imageURL.contains("https://500px.com/graphics/nude/img_3"))
Because this placeholder URL could be different, or could change any time.
Instead, always extract the target URL(s) from here:
for (Element script : doc.select("head > script")) {
if (script.html().contains("window.PxPreloadedData")) {
........
Because that script element with window.PxPreloadedData
should always be present.
@cyian-1756 any update on this one?
They implemented some insane rate limiting (I was still getting IP banned after waiting 10 secs between requests) so I haven't really be able to do much testing (As I get pretty much insta banned)
Maybe we need to make the wait interval long and slightly randomized to get around bot-detection?
^ 10 seconds and getting insta-banned is already a lot, so the base waiting time would have to be something like 15 or 20 seconds at minimum with 5-10 seconds range of randomization at minimum ... And those might not even be enough.
Tbh I'm very surprised how strict limiting they suddenly implemented.
Maybe we need to make the wait interval long and slightly randomized to get around bot-detection?
That might work, I'll look into it.
Tbh I'm very surprised how strict limiting they suddenly implemented.
I wouldn't be shocked if they did it to combat ripme considering it went into effect pretty much right after I fixed this ripper and added watermark free ripping