website-scraper-puppeteer icon indicating copy to clipboard operation
website-scraper-puppeteer copied to clipboard

Suggestion: catch dynamic load contents

Open zhangciwu opened this issue 5 years ago • 4 comments

While puppeteer is used inside this plugin, it should be able to monitor files that loaded by javascript, and add it to the download list, not only the ones written in html.

Could this be archived?

zhangciwu avatar Jan 19 '20 08:01 zhangciwu

Hi @zhangciwu Sorry for late response For now it's only possible to download files that are added to html markup. It does not monitor what is loaded dynamically by javascript. Could you please provide an example of such website so I'll check what can be done?

s0ph1e avatar Jan 28 '20 15:01 s0ph1e

Like some web game, which loads resources after js executes here is an example game

zhangciwu avatar Feb 08 '20 17:02 zhangciwu

Not sure if games can be target for automated scraping - they are more complex than regular html page I think.

As for dymanic content loading by js - looks like its possible to capture it by using puppeteer's page.setRequestInterception functionality. It also may require some updates to plugins functinality in website-scraper module (to allow multiple resources returned from 1 request)

I'm not going to work on it now since I do not have time for this. Contributions are welcome :)

s0ph1e avatar Feb 20 '20 15:02 s0ph1e

I don't think that such functionality should be a part of this module.

Seems more deeper puppeteer integration is required to intercept asynchronous requests and it should be done separately.

aivus avatar Dec 27 '21 19:12 aivus

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Nov 16 '22 21:11 stale[bot]