dashing
dashing copied to clipboard
Regex negative lookahead not supported
TL;DR
I need to improve the results of a specific css selector that returns items I don't want based on a word in the captured text. In this case "release". I don't want anything with that word in the source documentation to be added as a selector.
A bit wordier
In the source documentation, my selector not only returns a nice list of "Sections", but it also returns about 200 "release notes" sections, with the same css query selector.
Essentially I have a bunch of these I want to get rid of:
2020 Release Notes
2019 Winter Release Notes
Upgrade release-notes for xyz
I don't want those to be included in the resulting docset, so I tried my hand at the regex
field to return everything not including the word release
:
I essentially need the opposite of this:
^.*release.*$
So, don't return anything that has the word "release" in it.
I tried the (?!)
negative lookahead in regex, but I get the message:
error parsing regexp: invalid or unsupported Perl syntax: `(?!`
Is there a field in the selector object for rejecting if the title contains a word? I didn't see anything for this purpose in the README:
"css selector": {
"requiretext": "require that the text matches a regexp. If not, this node is not considered as selected",
"type": "Dash data type",
"attr": "Use the value of the specified attribute instead of html node text as the basis for transformation",
"regexp": "PCRE regular expression (no need to enclose in //)",
"replacement": "Replacement text for each match of 'regexp'",
"matchpath": "Only files matching this regular expression will be parsed. Will match all files if not set."
}
Dashing doesn't support PCRE. It supports golang regexp - which in turn is based on re2, which does not support lookaround.
I've hit something similar and worked around it by adding a match like this:
{
"type": "Guide",
"matchpath": "foobar/([^r]|r[^e]).*\\."
}