crawl
crawl copied to clipboard
Implement capture (custom scraping)
I think CSS Selectors are the way to go. The content already has to be parsed once to do the scraping internal to the crawler. If we can use CSS Selectors that take as input the net/html tree representation, we won't have to parse the body of each page twice.