CommunityScrapers icon indicating copy to clipboard operation
CommunityScrapers copied to clipboard

Improve Model searching for Pornhub

Open Joly0 opened this issue 1 year ago • 7 comments

Hey, i have noticed, that various Models on Pornhub are not found. This is due to the fact, that the original Pornhub.yml file uses this endpoint to search for models "https://www.pornhub.org/pornstars/search?search={}". This doesnt show all Models, just ones that are listed as pornstars. There are also users, who are uploading stuff, but not listedd as pornstars. By trying out, i found "Lana Bee" for example.

Though i have found out, that this endpoint gives back correctly the user "https://www.pornhub.org/user/search?username={}"

So there needs some improvement. I thought i could add this myself by changing the first few lines to this:

name: Pornhub
performerByName:
  - action: scrapeXPath
    queryURL: https://www.pornhub.com/pornstars/search?search={}
    scraper: performerSearch
  - action: scrapeXPath
    queryURL: https://www.pornhub.org/user/search?username={}
    scraper: modelSearch

but apparentely, the scraper doesnt handle lists very well (it doesnt at all actually, just throws a bunch of unmarshal errors). So there needs another way to add the second one

Edit: I have opened this as an enhancement, but it might also be a bug. I am not sure

The search should look something like this:

  modelSearch:
   performer:
     Name: //span[@class="usernameBadgesWrapper"]/a[@class="usernameLink"]/text()
     URL:
       selector: //span[@class="usernameBadgesWrapper"]/a[@class="usernameLink"]/@href
       postProcess:
         - replace:
             - regex: ^
               with: "https://www.pornhub.org"

Though i am unable to get this to work, as "performerByName" only accepts one action, queryURL and scraper

Joly0 avatar Aug 09 '24 22:08 Joly0

Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong

Joly0 avatar Aug 10 '24 00:08 Joly0

but apparentely, the scraper doesnt handle lists very well (it doesnt at all actually, just throws a bunch of unmarshal errors). So there needs another way to add the second one

Scrapers currently only support one search per type, so we'll have to choose between pornstar and model: personally I lean towards model since that's probably what users go to Pornhub for and more mainstream pornstars can be scraped from other sources

Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong

Can you expand on this, perhaps with an example URL?

Maista6969 avatar Aug 16 '24 00:08 Maista6969

Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong

studio is technically correct, there is a performers box that just isn't used and that's how it's handled on StashDB.

I think it might be better to have a Pornhub-Models fork that handles these edge cases differently and will also do /model search

feederbox826 avatar Aug 31 '24 01:08 feederbox826

Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong

Can you expand on this, perhaps with an example URL?

Sure, for example with this video https://www.pornhub.org/view_video.php?viewkey=665908d7b8b52 i try to scrape it using stash and the pornhub scraper, it scrapes the user (in this case "Anna Cherry7") as the studio instead of as a performer. This resulted into me having a whole bunch (like 200-300) studios, which are really performers instead of studios because they were scraped wrong.

Joly0 avatar Sep 01 '24 12:09 Joly0

Also here it would be good to change the behaviour of scraping scenes on pornhub. If i search a scene with a user, rather than a performer, the user is added as a studio, rather than a performer. This is wrong

studio is technically correct, there is a performers box that just isn't used and that's how it's handled on StashDB.

I think it might be better to have a Pornhub-Models fork that handles these edge cases differently and will also do /model search

In my opinion, the default should be search for user rather than search for pornstar as there are a lot more users than pornstars and most pornstars can be better scraped using other scrapers like stashdb, but most users dont exist on other sites.

Otherwise having the option to have multiple searches for for performers would be useful. So for example for the pornhub scraper, the scraper could first check and search for the user as a pornstar and if it doesnt find it, it retries searching for the user as a performer.

Joly0 avatar Sep 01 '24 12:09 Joly0

In my opinion, the default should be search for user rather than search for pornstar as there are a lot more users than pornstars and most pornstars can be better scraped using other scrapers like stashdb, but most users dont exist on other sites.

While I agree, the guidelines for StashDB dictate that the performer be the studio, with most third-party content being invalid

Otherwise having the option to have multiple searches for for performers would be useful. So for example for the pornhub scraper, the scraper could first check and search for the user as a pornstar and if it doesnt find it, it retries searching for the user as a performer.

This would be great but not possible with our current scraper architecture. I stil think having two seperate scrapers would be best

feederbox826 avatar Sep 03 '24 20:09 feederbox826

https://www.pornhub.com/view_video.php?viewkey=66ff9acd04ebb

Just to add onto this, is there any way to have a link like above pull the performers? It pulls the channel properly but neither of the two performers. This was just the first one on the page when I opened it.

Was there ever a way to swap to search by Model and not Pornstar as I dug around a bit 'Pornstar' is very subjective.

https://www.pornhub.com/pornstar/scarlet-chase Take this one, there is no direct link to the performer from it, but it you go to any of the videos it has a performer link to the "Pornstar" as the same link. Thus making a circle of no information.

Even though the performer is https://www.pornhub.com/model/secretcrush which does not appear in the Pornstar category.

Yet in reverse we have https://www.pornhub.com/pornstar/rae-lil-black who is labeled as a Pornstar with no model page.

From what I can figure out it depends on how they started. If they were added by a Channel as their first appearance they became Pornstars if they themselves took over that title or had their own model page first they are Models

I tried to modify it myself to search for models but I ended up breaking it. Can anyone swap it to models and or split off just a model search?

Eleniatari avatar Oct 07 '24 13:10 Eleniatari

Hey @feederbox826 , thanks for implementing this, though i have a question now: As i stated earlier, i have a bunch of studios now, that are not studios, but models. I am fine with the guidelines dictating that each scene needs a studio and the studio is the performer, but if no performer is present, which is the case for each pornhub scene thats from a model and not a pornstar with a studio, the performer is left empty, leaving me with quite a lot of studios that are actually performers and no way for me to scrape the missing performers and add them to their associated scene, or am i missing something?

It would be great, to have such a functionality, though i am not sure if this feature request would be suited here or at the stashapp repository and if there is even anything that could be done.

Also i noticed, that scraping a pornhub model, for example TheFoxAlina, nothing is returned from the scraper. This might be a network issue or something else, i am not sure. But wanted to mention it

Joly0 avatar Dec 21 '24 21:12 Joly0

As i stated earlier, i have a bunch of studios now, that are not studios, but models. I am fine with the guidelines dictating that each scene needs a studio and the studio is the performer, but if no performer is present, which is the case for each pornhub scene thats from a model and not a pornstar with a studio, the performer is left empty, leaving me with quite a lot of studios that are actually performers and no way for me to scrape the missing performers and add them to their associated scene, or am i missing something?

I looked into this, the problem is that the current scraper does some weird regex parsing to get the performers out from the performers box that ph provides, so adding the selector for the uploader would break it

feederbox826 avatar Dec 22 '24 06:12 feederbox826