CommunityScrapers
CommunityScrapers copied to clipboard
Improve Model searching for Pornhub
Hey, i have noticed, that various Models on Pornhub are not found. This is due to the fact, that the original Pornhub.yml file uses this endpoint to search for models "https://www.pornhub.org/pornstars/search?search={}". This doesnt show all Models, just ones that are listed as pornstars. There are also users, who are uploading stuff, but not listedd as pornstars. By trying out, i found "Lana Bee" for example.
Though i have found out, that this endpoint gives back correctly the user "https://www.pornhub.org/user/search?username={}"
So there needs some improvement. I thought i could add this myself by changing the first few lines to this:
name: Pornhub
performerByName:
- action: scrapeXPath
queryURL: https://www.pornhub.com/pornstars/search?search={}
scraper: performerSearch
- action: scrapeXPath
queryURL: https://www.pornhub.org/user/search?username={}
scraper: modelSearch
but apparentely, the scraper doesnt handle lists very well (it doesnt at all actually, just throws a bunch of unmarshal errors). So there needs another way to add the second one
Edit: I have opened this as an enhancement, but it might also be a bug. I am not sure
The search should look something like this:
modelSearch:
performer:
Name: //span[@class="usernameBadgesWrapper"]/a[@class="usernameLink"]/text()
URL:
selector: //span[@class="usernameBadgesWrapper"]/a[@class="usernameLink"]/@href
postProcess:
- replace:
- regex: ^
with: "https://www.pornhub.org"
Though i am unable to get this to work, as "performerByName" only accepts one action, queryURL and scraper
Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong
but apparentely, the scraper doesnt handle lists very well (it doesnt at all actually, just throws a bunch of unmarshal errors). So there needs another way to add the second one
Scrapers currently only support one search per type, so we'll have to choose between pornstar and model: personally I lean towards model since that's probably what users go to Pornhub for and more mainstream pornstars can be scraped from other sources
Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong
Can you expand on this, perhaps with an example URL?
Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong
studio is technically correct, there is a performers box that just isn't used and that's how it's handled on StashDB.
I think it might be better to have a Pornhub-Models fork that handles these edge cases differently and will also do /model search
Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong
Can you expand on this, perhaps with an example URL?
Sure, for example with this video https://www.pornhub.org/view_video.php?viewkey=665908d7b8b52 i try to scrape it using stash and the pornhub scraper, it scrapes the user (in this case "Anna Cherry7") as the studio instead of as a performer. This resulted into me having a whole bunch (like 200-300) studios, which are really performers instead of studios because they were scraped wrong.
Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong
studio is technically correct, there is a performers box that just isn't used and that's how it's handled on StashDB.
I think it might be better to have a Pornhub-Models fork that handles these edge cases differently and will also do /model search
In my opinion, the default should be search for user rather than search for pornstar as there are a lot more users than pornstars and most pornstars can be better scraped using other scrapers like stashdb, but most users dont exist on other sites.
Otherwise having the option to have multiple searches for for performers would be useful. So for example for the pornhub scraper, the scraper could first check and search for the user as a pornstar and if it doesnt find it, it retries searching for the user as a performer.
In my opinion, the default should be search for user rather than search for pornstar as there are a lot more users than pornstars and most pornstars can be better scraped using other scrapers like stashdb, but most users dont exist on other sites.
While I agree, the guidelines for StashDB dictate that the performer be the studio, with most third-party content being invalid
Otherwise having the option to have multiple searches for for performers would be useful. So for example for the pornhub scraper, the scraper could first check and search for the user as a pornstar and if it doesnt find it, it retries searching for the user as a performer.
This would be great but not possible with our current scraper architecture. I stil think having two seperate scrapers would be best
https://www.pornhub.com/view_video.php?viewkey=66ff9acd04ebb
Just to add onto this, is there any way to have a link like above pull the performers? It pulls the channel properly but neither of the two performers. This was just the first one on the page when I opened it.
Was there ever a way to swap to search by Model and not Pornstar as I dug around a bit 'Pornstar' is very subjective.
https://www.pornhub.com/pornstar/scarlet-chase Take this one, there is no direct link to the performer from it, but it you go to any of the videos it has a performer link to the "Pornstar" as the same link. Thus making a circle of no information.
Even though the performer is https://www.pornhub.com/model/secretcrush which does not appear in the Pornstar category.
Yet in reverse we have https://www.pornhub.com/pornstar/rae-lil-black who is labeled as a Pornstar with no model page.
From what I can figure out it depends on how they started. If they were added by a Channel as their first appearance they became Pornstars if they themselves took over that title or had their own model page first they are Models
I tried to modify it myself to search for models but I ended up breaking it. Can anyone swap it to models and or split off just a model search?
Hey @feederbox826 , thanks for implementing this, though i have a question now: As i stated earlier, i have a bunch of studios now, that are not studios, but models. I am fine with the guidelines dictating that each scene needs a studio and the studio is the performer, but if no performer is present, which is the case for each pornhub scene thats from a model and not a pornstar with a studio, the performer is left empty, leaving me with quite a lot of studios that are actually performers and no way for me to scrape the missing performers and add them to their associated scene, or am i missing something?
It would be great, to have such a functionality, though i am not sure if this feature request would be suited here or at the stashapp repository and if there is even anything that could be done.
Also i noticed, that scraping a pornhub model, for example TheFoxAlina, nothing is returned from the scraper. This might be a network issue or something else, i am not sure. But wanted to mention it
As i stated earlier, i have a bunch of studios now, that are not studios, but models. I am fine with the guidelines dictating that each scene needs a studio and the studio is the performer, but if no performer is present, which is the case for each pornhub scene thats from a model and not a pornstar with a studio, the performer is left empty, leaving me with quite a lot of studios that are actually performers and no way for me to scrape the missing performers and add them to their associated scene, or am i missing something?
I looked into this, the problem is that the current scraper does some weird regex parsing to get the performers out from the performers box that ph provides, so adding the selector for the uploader would break it