Fix npic false positive
Hi @smoelius! I've researched on my own about how to filter out the false positives we are encountering(#1536) consisting of:
- Complex URL schemas
- Paths for demonstration, starting with
$
I managed to make significant progress, especially with filtering out these two.
The first notable change from the lint is the switch to the fancy-regex crate, which allow more flexibility by using look-around assertions in regular expressions. Here's a detailed breakdown of the key improvements:
- URL filtering with
fancy-regex
- Added
URL_REGEXpatternhttps?://\S+to detect web URLs - Implemented URL range exclusion logic to avoid path checks in URLs
- Improved path pattern matching
- Updated
PATH_REGEXto support:- Dollar signs in paths (
\$)
- Dollar signs in paths (
- Added explicit exclusion for paths starting with:
- URL protocols (
http://,https://) - Web domains (
www.) - Placeholder indicators (
$)
- URL protocols (
These changes significantly reduce false positives while maintaining strong detection of actual broken path references. The fancy-regex crate enables complex pattern matching that would panic with the standard regex crate, particularly the negative lookbehind needed for accurate URL detection within comments.
The test suite has also been updated to verify these new cases, including:
- URLs containing path-like fragments
- Documentation comments with intentional examples
- Placeholder paths using
$DIR/pathsyntax
Looking forward to your feedback!
Sorry, I've just been busy with other things. I will review this within the next few days.
These changes significantly reduce false positives while maintaining strong detection of actual broken path references. The fancy-regex crate enables complex pattern matching that would panic with the standard regex crate, particularly the negative lookbehind needed for accurate URL detection within comments.
Thanks a lot for working on this. I'm not familiar with fancy-regex, but its docs suggest that lookbehind would use a syntax like (?<=exp) or (?<!exp). I don't see either of those in the code. Am I making a mistake by looking for them?
More generally, I'm have a little trouble with the logic. Could you explain what url_ranges and non_url_ranges are, intuitively?
@augustin-v Thanks again for working on this.
I will get to reviewing #1535 soon.