feat: arstechnica.com extractor

Open jbrayton opened this issue 5 years ago • 0 comments

This is an extractor for arstechnica.com. A few notes:

I removed the contentOnly: true option from extractorOpts in collect-all-pages.js because it resulted in next_page_url always being null on the second page of an article.
Articles from this site are often paginated, but I was unable to write a CSS selector to find the next page. On the last page, there will be a link with a CSS selector indicating that the previous page is next. But the parser appears to find the next page without this extractor finding it, as long as the fallback option is left at its default value of true.

Apr 27 '20 12:04 jbrayton