parser
parser copied to clipboard
feat: arstechnica.com extractor
This is an extractor for arstechnica.com. A few notes:
-
I removed the
contentOnly: trueoption fromextractorOptsincollect-all-pages.jsbecause it resulted innext_page_urlalways being null on the second page of an article. -
Articles from this site are often paginated, but I was unable to write a CSS selector to find the next page. On the last page, there will be a link with a CSS selector indicating that the previous page is next. But the parser appears to find the next page without this extractor finding it, as long as the
fallbackoption is left at its default value oftrue.