NowCrawling How to crawl all links from website using regex? How to write correct regex for all links?

How to crawl all links from website using regex? How to write correct regex for all links?

Open MindaugasVaitkus2 opened this issue 6 years ago • 2 comments

My trials: 1. --regex=((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))[0-9A-Za-z%?&#_=+.~]([0-9A-Za-z%?&#_=+./~])*(?=['\"] The system cannot find the file specified.

((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))([0-9A-Za-z%?&#_=+./~])*(?=['\"])

The system cannot find the file specified.

Aug 26 '18 16:08 MindaugasVaitkus2

I used regex below with works perfect: (http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)? https://www.regextester.com/97812 from link

Aug 26 '18 16:08 MindaugasVaitkus2

how use in regex with command line?

Jan 30 '20 13:01 meysam1366

NowCrawling NowCrawling copied to clipboard

How to crawl all links from website using regex? How to write correct regex for all links?

NowCrawling
NowCrawling copied to clipboard