NowCrawling
NowCrawling copied to clipboard
How to crawl all links from website using regex? How to write correct regex for all links?
My trials:
1.
--regex=((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))[0-9A-Za-z%?&#_=+.~]([0-9A-Za-z%?&#_=+./~])*(?=['\"]
The system cannot find the file specified.
((?<=src=[\"'])|(?<=href=.))(?!(http(s|)(:|%3[Aa])))([0-9A-Za-z%?&#_=+./~])*(?=['\"])
The system cannot find the file specified.
I used regex below with works perfect:
(http:\/\/|https:\/\/)?[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?
https://www.regextester.com/97812 from link
how use in regex with command line?