parser
parser copied to clipboard
feat: Add a custom extractor for www.engadget.com.
Add a custom extractor for www.engadget.com.
Engadget articles have dates, but I was unable to find one in a format I could parse. There are strings like "2h ago" and tags with blank values such as this:
<meta class="swiftype" name="published_at" data-type="date" value="">
So the extractor always returns a null date.
Engadget articles also have lead images, but I was unable to return the value. For example, the fixture has:
<meta value="https://o.aolcdn.com/images/dims?resize=1200%2C630&crop=1200%2C630%2C0%2C0&quality=80ℑuri=https%3A%2F%2Fs.yimg.com%2Fos%2Fcreatr-images%2F2020-04%2F7e5e3a50-8658-11ea-befb-f52e76d9e7b2&client=amp-blogside-v2&signature=193a0258fa9a401d2f1cdfc41909ac01e4db3147" name="og:image">
If I put a simpler URL in that value, I could select the image. I think the ℑ sequence in the URL is messing things up. I did incorporate lead images into the HTML content.
If someone reviewing this thinks there is a good way to address these issues I am eager to do that.