[Bug Report] Scrapers that build a queryURL
There are several niggling issues with scrapers that build a queryURL in order to provide scrapeByFragment functionality and in aggregate this is creating a poor user experience.
For this example we can use the following scraper:
name: StudioX
sceneByURL:
- action: scrapeXPath
url:
- studiox.com/update
scraper: sceneScraper
sceneByFragment:
action: scrapeXPath
queryURL: "{url}"
scraper: sceneScraper
xPathScrapers:
sceneScraper:
scene:
Title:
fixed: Example scraper
The implicit requirements for this scraper mean that a scene needs to:
- Have an URL saved to the scene
- That URL needs to be the first URL
- The URL needs to match the pattern of the
sceneByURLaction
If the user is not aware of these requirements and fails to meet them, Stash will either show the cryptic error scraper StudioX: Get "%7Burl%7D": unsupported protocol scheme (if the URL is missing) or give a false positive with a green notification that says No scenes found (if the URL does not match, or the matcing URL is not the first one).
To reproduce this we can create an empty scene and:
- select StudioX from the "Scrape with..." dropdown: first confusing error message
- add (and save) the URL
https://example.comto this scene and select StudioX from the "Scrape with..." dropdown: No scenes found - add another URL like
https://example.com/update/2024(which matches the pattern) and scrape again: No scenes found
Since the queryURL can be built from several fields (checksum, oshash, filename, title and url in queryURLParametersFromScene) the first two error cases would apply to most of these fields.
Expected behavior
If constructing an appropriate queryURL fails I would expect a more specific error message to help the user solve the problem. "Scraping this requires that X, Y, Z fields be filled" or something to this effect.
If a scene has multiple values for a field (most importantly URLs) then I'd expect the scraper to try all of them until one works (or simply matches a pattern in the sceneScraper) or return an error message like "No matching URLs found for this scraper"
Additional context I looked at the code for scraping scenes in scraper/xpath.go and while I'm not familiar enough with the scraper codebase to know where a fix should be applied (this would effect json scrapers as well, should this be pulled up a level to avoid duplication?) something like this might be a start:
if !s.config.matchesURL(url, ScrapeContentTypeScene) {
re := regexp.MustCompile(`\{([^}]+)\}`)
remainingPlaceholders := re.FindAllString(url, -1)
if len(remainingPlaceholders) == 0 {
return nil, fmt.Errorf("url doesn't match scraper: %s", url)
}
missingReplacements := make([]string, len(remainingPlaceholders))
for i, v := range remainingPlaceholders {
missingReplacements[i] = v[1 : len(v)-1]
}
errMsg := strings.Join(missingReplacements, ", ")
return nil, fmt.Errorf("missing fields: %s", errMsg)
}