website-scraper-puppeteer Suggestion: catch dynamic load contents

Suggestion: catch dynamic load contents

Open zhangciwu opened this issue 5 years ago • 4 comments

While puppeteer is used inside this plugin, it should be able to monitor files that loaded by javascript, and add it to the download list, not only the ones written in html.

Could this be archived?

Jan 19 '20 08:01 zhangciwu

Hi @zhangciwu Sorry for late response For now it's only possible to download files that are added to html markup. It does not monitor what is loaded dynamically by javascript. Could you please provide an example of such website so I'll check what can be done?

Jan 28 '20 15:01 s0ph1e

Like some web game, which loads resources after js executes here is an example game

Feb 08 '20 17:02 zhangciwu

Not sure if games can be target for automated scraping - they are more complex than regular html page I think.

As for dymanic content loading by js - looks like its possible to capture it by using puppeteer's page.setRequestInterception functionality. It also may require some updates to plugins functinality in website-scraper module (to allow multiple resources returned from 1 request)

I'm not going to work on it now since I do not have time for this. Contributions are welcome :)

Feb 20 '20 15:02 s0ph1e

I don't think that such functionality should be a part of this module.

Seems more deeper puppeteer integration is required to intercept asynchronous requests and it should be done separately.

Dec 27 '21 19:12 aivus

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Nov 16 '22 21:11 stale[bot]

website-scraper-puppeteer website-scraper-puppeteer copied to clipboard

Suggestion: catch dynamic load contents

website-scraper-puppeteer
website-scraper-puppeteer copied to clipboard