ReadabiliPy icon indicating copy to clipboard operation
ReadabiliPy copied to clipboard

Pass the url option to the JSDOM constructor to get images and relative links fixed

Open facundoolano opened this issue 2 years ago • 1 comments

I've noticed that the content parsed by this library keeps relative urls for images, which prevents them to be rendered independently. (See for example the images in this url).

As per the mozilla/readability repo:

Remember to pass the page's URI as the url option in the JSDOM constructor (as shown in the example above), so that Readability can convert relative URLs for images, hyperlinks etc. to their absolute counterparts.

Alternatively, the JSDOM.fromURL function can be used, which already does that job. I confirmed with this basic script that the content comes with fixed images in that case:

#!/usr/bin/env node

const { JSDOM } = require("jsdom");
const { Readability } = require('@mozilla/readability');

const url = process.argv[2];

JSDOM.fromURL(url).then(function (dom) {
  let reader = new Readability(dom.window.document);
  let article = reader.parse();
  console.log(article.content);
});

facundoolano avatar Aug 17 '23 20:08 facundoolano

Hi @facundoolano. No-one is currently actively working on this project, but if you're interested in making a PR that would close this issue I'm happy to review it.

jemrobinson avatar Aug 18 '23 10:08 jemrobinson