stash
stash copied to clipboard
[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields
Describe the bug Currently, when a scene scraper is run, the resulting Detail field's newlines get removed.
To Reproduce Steps to reproduce the behavior:
- Go to the edit tab on a scene
- Fill in an URL of a scene with a multiline description that has a scraper associated with it
- Scrape the scene details
- Profit
Expected behavior Since Detail is presented as a multiline textbox, I would expect newlines to survive.
Stash Version: v0.1.1-167-gdc5efb9
This was already mentioned in discord channel and here https://github.com/stashapp/CommunityScrapers/pull/49 . The problem is that the xpath code applies some common postprocessing that removes multiple spaces and newlines for every field. For the details one I think we can skip the line ( "\n" ) removal.
To complete this the scene details panel in the UI needs the pre
class defined in the css
.pre {
white-space: pre-line;
}
Was adviced I had some code that wasn't working as I wanted but I forgot to revisit that, I'll have another look when I can.
related to #579
@bnkai is this still a reproducible?
@WithoutPants I wouldn't say reproducible since I don't have a test sample available but it's not yet 100% resolved. Nodetext function that processes every field still removes newlines. #579 works for newlines that are added by the user or are part of an element attribute but not for newlines that are already processed by the nodeText function
Closed as presumably stale. If we're still getting this, we can reopen or open a new issue.
@WithoutPants Please see comments in this issue: https://github.com/stashapp/CommunityScrapers/issues/123#issuecomment-2013769621 https://github.com/stashapp/CommunityScrapers/issues/123#issuecomment-2014098481
There is still an issue with nodeText function in how text is being scraped when there are new lines in HTML. I provided the instructions to reproduce. Can this issue be reopened?
Scraper BluMedia.yml
When scraping scenes from collegedudes.com, the Details selector (under cdScraper) seems to be smashing together text broken up with line breaks in the HTML source, so that there's no space separating the words. For instance, on https://www.collegedudes.com/play/MTgz/cody-busts-a-nut the text that says "around to watch us" becomes "around towatch us".