parser icon indicating copy to clipboard operation
parser copied to clipboard

Parser fails to parse links which are not valid regex

Open haynesgt opened this issue 1 year ago • 0 comments

Expected Behavior

Should be able to Parser.parse("https://ahrefs.com/jobs/clickhouse-c++-developer")

Current Behavior

SyntaxError: Invalid regular expression: /^https://ahrefs.com/jobs/clickhouse-c++-developer/i: Nothing to repeat
    at new RegExp (<anonymous>)
    at makeBaseRegex (node_modules/@postlight/parser/dist/mercury.js:7403:10)
    at scoreLinks (node_modules/@postlight/parser/dist/mercury.js:7419:19)

Steps to Reproduce

await (Parser = require("@postlight/parser")).parse("https://ahrefs.com/jobs/clickhouse-c++-developer")

Detailed Description

Issue in 2.2.3

https://github.com/postlight/parser/blob/e8ba7ece291efa4d915d50dd4deeec17d54359f2/src/extractors/generic/next-page-url/scoring/score-links.js#L20

Possible Solution

function makeBaseRegex(baseUrl) {
  var escapedUrl = baseUrl.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');
  return new RegExp(`^${escapedUrl}`), 'i');
}

haynesgt avatar May 16 '24 18:05 haynesgt