parser
parser copied to clipboard
How to `clone` the `video` portion of the HTML page in order to extract and keep it intact?
How to clone the video portion of the HTML page in order to extract and keep it intact?
For example: From this url : https://abcnews.go.com/Politics/arizona-gov-doug-ducey-signs-law-purge-voters/story?id=77606533&cid=clicksource_4380645_1_heads_hero_live_hero_image
I would like to keep the video streaming.
I tried to modify the abcnew.go.com extractor in this way:
export const AbcnewsGoComExtractor = {
domain: 'abcnews.go.com',
title: {
selectors: ['.article-header h1'],
},
author: {
selectors: ['.authors'],
clean: ['.author-overlay', '.by-text'],
},
date_published: {
selectors: ['.timestamp'],
timezone: 'America/New_York',
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']],
},
video: {
selectors: [
'inline-video-wrapper',
'video',
]
},
content: {
defaultCleaner: false,
selectors: [
'.article-copy',
'#player-api',
'inline-video-wrapper',
'video',
],
// Is there anything that is in the result that shouldn't be?
// The clean selectors will remove anything that matches from
// the result
clean: [],
},
};
But this is the output:

I also tried in this way, but it doesn't work:
'div.inline-content': $node => {
if ($node.has('img,iframe,video').length > 0) {
return $node;
}
},
How to clone the video portion of the HTML page in order to extract and keep it intact?
OS: Ubuntu 18.04
Are there maybe any updates regarding this?