stash icon indicating copy to clipboard operation
stash copied to clipboard

[Bug Report] XPath Scraper shouldn't remove newlines for Detail fields

Open compound-dumbo opened this issue 4 years ago • 6 comments

Describe the bug Currently, when a scene scraper is run, the resulting Detail field's newlines get removed.

To Reproduce Steps to reproduce the behavior:

  1. Go to the edit tab on a scene
  2. Fill in an URL of a scene with a multiline description that has a scraper associated with it
  3. Scrape the scene details
  4. Profit

Expected behavior Since Detail is presented as a multiline textbox, I would expect newlines to survive.

Stash Version: v0.1.1-167-gdc5efb9

compound-dumbo avatar Jun 02 '20 20:06 compound-dumbo

This was already mentioned in discord channel and here https://github.com/stashapp/CommunityScrapers/pull/49 . The problem is that the xpath code applies some common postprocessing that removes multiple spaces and newlines for every field. For the details one I think we can skip the line ( "\n" ) removal. To complete this the scene details panel in the UI needs the pre class defined in the css

.pre { 
white-space: pre-line;
}

Was adviced I had some code that wasn't working as I wanted but I forgot to revisit that, I'll have another look when I can.

bnkai avatar Jun 02 '20 21:06 bnkai

related to #579

bnkai avatar Jun 06 '20 10:06 bnkai

@bnkai is this still a reproducible?

WithoutPants avatar Aug 19 '20 05:08 WithoutPants

@WithoutPants I wouldn't say reproducible since I don't have a test sample available but it's not yet 100% resolved. Nodetext function that processes every field still removes newlines. #579 works for newlines that are added by the user or are part of an element attribute but not for newlines that are already processed by the nodeText function

bnkai avatar Aug 19 '20 14:08 bnkai

Closed as presumably stale. If we're still getting this, we can reopen or open a new issue.

WithoutPants avatar Aug 30 '21 01:08 WithoutPants

@WithoutPants Please see comments in this issue: https://github.com/stashapp/CommunityScrapers/issues/123#issuecomment-2013769621 https://github.com/stashapp/CommunityScrapers/issues/123#issuecomment-2014098481

There is still an issue with nodeText function in how text is being scraped when there are new lines in HTML. I provided the instructions to reproduce. Can this issue be reopened?

Scraper BluMedia.yml

When scraping scenes from collegedudes.com, the Details selector (under cdScraper) seems to be smashing together text broken up with line breaks in the HTML source, so that there's no space separating the words. For instance, on https://www.collegedudes.com/play/MTgz/cody-busts-a-nut the text that says "around to watch us" becomes "around towatch us".

bkbd3177 avatar Mar 22 '24 14:03 bkbd3177