NowCrawling icon indicating copy to clipboard operation
NowCrawling copied to clipboard

How to crawl all links from website using regex? How to write correct regex for all links?

Open MindaugasVaitkus2 opened this issue 6 years ago • 2 comments

My trials: 1. --regex=((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))[0-9A-Za-z%?&#_=+.~]([0-9A-Za-z%?&#_=+./~])*(?=['\"] The system cannot find the file specified.

((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))([0-9A-Za-z%?&#_=+./~])*(?=['\"])

The system cannot find the file specified.

MindaugasVaitkus2 avatar Aug 26 '18 16:08 MindaugasVaitkus2

I used regex below with works perfect: (http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)? https://www.regextester.com/97812 from link

MindaugasVaitkus2 avatar Aug 26 '18 16:08 MindaugasVaitkus2

how use in regex with command line?

meysam1366 avatar Jan 30 '20 13:01 meysam1366