parser icon indicating copy to clipboard operation
parser copied to clipboard

Parsing lead_image_url when there are multiple og:image's present

Open TLadd opened this issue 5 years ago • 0 comments

  • Platform: OS X
  • Mercury Parser Version: 2.2.0
  • Node Version (if a Node bug): v12.16.2

Expected Behavior

If a site has og:image set twice, it would choose one of them as the lead_image_url. Obviously having duplicate og:image's specified is a mistake but I would still like to handle parsing the image out in this scenario.

Current Behavior

It chooses neither of the images and ends up just choosing another image on the page

Steps to Reproduce

const MercuryParser = require("@postlight/mercury-parser");
const x = await MercuryParser.parse("https://www.realityblurred.com/realitytv/2017/08/ayto-season-six-host-terrence-j/"); // Any page with two `og:image`'s set
console.log(x.lead_image_url);

This prints https://www.realityblurred.com/realitytv/wp-content/themes/realityblurred/images/Andy-Dehnart.jpg, which is the first image in the body of the page. The page itself does have an identical og:image, but it is specified twice in the head:

<meta property="og:image" content="https://www.realityblurred.com/realitytv/images/2017/08/ayto-season-six-cast.jpg">

Detailed Description

I'm trying to get the lead image url out of the above page.

Possible Solution

If there are multiple og:image's present in a page, choose the first one.

TLadd avatar Nov 20 '20 23:11 TLadd