easy-scraper icon indicating copy to clipboard operation
easy-scraper copied to clipboard

Ignore tags in target locations

Open SFrav opened this issue 4 years ago • 1 comments

I have a two challenging cases:

  1. where there are tags within a target location.

Raw doc:

<h class= ”name” Baz =“key here”> this is the text <sub>we</sub> want </h>

Note, no quotes around the target text and it sits between the opening and closing tag.

Pattern:

R##” <h class= ”name” Baz ={{key}}> {{this}} </h>

Can we ignore all these tags?

  1. Variable number of author target tags in a doc

<h class= ”bar” baz= “one”> <span itemprop =”name”>bla</span> </h> <h class= ”bar” baz= “two”> <span itemprop =”name”>bla</span> <span itemprop =”name”>foo</span> </h>

Pattern: R##” <h class= ”bar” baz={{key}}> <span itemprop =”name”>{{auth}}</span> </h>

Can we just take the first itemprop?

Both examples produce multiples. A work around could be to combine based on a common key. This wouldn’t work in all cases.

SFrav avatar Apr 06 '20 17:04 SFrav

Ok, I read through the code and have some solutions.

R##” <h class= ”name” Baz ={{key}}> {{this}} </h>

Becomes

R##” <h class= ”name” Baz ={{key}}> {{this:*}} </h>

And R##” <h class= ”bar” baz={{key}}> <span itemprop =”name”>{{auth}}</span> </h>

I think, becomes R##” <h class= ”bar” baz={{key}}> <span itemprop =”name”>{{auth}}</span> ... <span itemprop =”name”>{{authother}}</span> </h>

Then there’s some extra processing to combine vectors with common 1st authors. I’m not sure how this would work with only one author though.

SFrav avatar Apr 09 '20 11:04 SFrav