Scraper Target CSS Pseudo-Elements
Sorry if this isn't the appropriate place to ask this, but I've been searching off-and-on for about a week and can't find an answer.
I'm trying to create a Scraper for Strokies. I'm a web developer but not experienced with Python in general. I got it mostly working. The only problem is most of the DIVs don't have classes or IDs, so I was wondering if I could target them with something like "last-of-child" or "nth-of-type", etc.
Here's what I got and only the Title works. Hopefully someone knows whether or not this is possible and the proper syntax to do it.
name: "Strokies"
sceneByURL:
- action: scrapeXPath
url:
- strokies.com/video/
scraper: strokiesScraper
xPathScrapers:
strokiesScraper:
scene:
Title:
selector: //h1/text()
Date:
selector: //div[@class="video-info"]/div/p:nth-of-type(3)/text()
postProcess:
- parseDate: Jan 2, 2006
Details:
selector: //div[contains(@class, "video-text")]/div:nth-of-type(4)/p
concat: "\n\n"
Performers:
Name: //div[contains(@class, "video-text")]/div:nth-of-type(2)/a
Tags:
Name: //div[contains(@class, "video-text")]/div:nth-of-type(3)/a
Image:
selector: //img[@class="vjs-tech"]
postProcess:
- replace:
- regex: .+(?:poster=)([^"]*)
with: $1
Studio:
Name:
fixed: Strokies
Any hep with that syntax would be greatly appreciated.
TIA!
What stash uses is xpaths not css.
You can have a look at https://github.com/stashapp/stash/blob/develop/ui/v2.5/src/docs/en/ScraperDevelopment.md#xpath-and-json-scrapers-configuration and https://devhints.io/xpath for more details.
For example div:nth-of-type(4) is div[4] as an xpath,
In practice we try to avoid selectors based on index (if possible), in your case something like the below
name: "Strokies"
sceneByURL:
- action: scrapeXPath
url:
- strokies.com/video/
scraper: strokiesScraper
xPathScrapers:
strokiesScraper:
scene:
Title:
selector: //h1/text()
Date:
selector: //div[@class="video-info"]//p[starts-with(text(),"Added on:")]
postProcess:
- replace:
- regex: '^Added on:\s*'
with: ""
- parseDate: Jan 2, 2006
Details:
selector: '//div[contains(@class, "video-text")]/div[@style="color: white;"]/p'
concat: "\n\n"
Performers:
Name: //div[@class="model-tags"]/span[starts-with(text(),"Model:")]/following-sibling::a
Tags:
Name: //div[@class="model-tags"]/span[starts-with(text(),"Tags:")]/a
Image:
selector: //video/@poster
postProcess:
- replace:
- regex: ^//
with: https://
Studio:
Name:
fixed: Strokies
# Last Updated March 19, 2022
Superseded by #1247