website-scraper-puppeteer
website-scraper-puppeteer copied to clipboard
Suggestion: catch dynamic load contents
While puppeteer is used inside this plugin, it should be able to monitor files that loaded by javascript, and add it to the download list, not only the ones written in html.
Could this be archived?
Hi @zhangciwu Sorry for late response For now it's only possible to download files that are added to html markup. It does not monitor what is loaded dynamically by javascript. Could you please provide an example of such website so I'll check what can be done?
Like some web game, which loads resources after js executes here is an example game
Not sure if games can be target for automated scraping - they are more complex than regular html page I think.
As for dymanic content loading by js - looks like its possible to capture it by using puppeteer's page.setRequestInterception functionality. It also may require some updates to plugins functinality in website-scraper module (to allow multiple resources returned from 1 request)
I'm not going to work on it now since I do not have time for this. Contributions are welcome :)
I don't think that such functionality should be a part of this module.
Seems more deeper puppeteer integration is required to intercept asynchronous requests and it should be done separately.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.