article-extractor
article-extractor copied to clipboard
Crashes on Pinterest and a lot of other websites
Pages to test on:
- https://www.pinterest.ca/variamsingh87/
- https://www.pinterest.com.au/seriako/
Code:
import { extract } from '@extractus/article-extractor'
const input = 'https://www.pinterest.ca/variamsingh87/'
await extract(input)
Error:
TypeError: Cannot read properties of null (reading 'tagName')
at Readability._grabArticle (/Users/vasyl/code/killme/node_modules/@mozilla/readability/Readability.js:1150:37)
at Readability.parse (/Users/vasyl/code/killme/node_modules/@mozilla/readability/Readability.js:2277:31)
at default (file:///Users/vasyl/code/killme/node_modules/@extractus/article-extractor/src/utils/extractWithReadability.js:18:25)
at file:///Users/vasyl/code/killme/node_modules/@extractus/article-extractor/src/utils/parseFromHtml.js:88:14
at file:///Users/vasyl/code/killme/node_modules/bellajs/src/utils/pipe.js:4:38
at file:///Users/vasyl/code/killme/node_modules/bellajs/src/utils/pipe.js:4:40
at file:///Users/vasyl/code/killme/node_modules/bellajs/src/utils/pipe.js:4:40
at default (file:///Users/vasyl/code/killme/node_modules/@extractus/article-extractor/src/utils/parseFromHtml.js:98:19)
at extract (file:///Users/vasyl/code/killme/node_modules/@extractus/article-extractor/src/main.js:24:10)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
I presume that the bug is somewhere inside the linkedom
package, DOMParser
class.