Create analyzer for scraping new navigation from headless page states

Open Ice3man543 opened this issue 3 years ago • 1 comments

Anchor, Button, Embed, and Iframe for direct links.
Parse and fill HTML Forms as well optionally. (Login, Register, etc using these methods)
Scrape javascript / javascript files and collect links using regex.
Collect requests made by XHR/Javascript APIs as well.
Elements having event listeners can be navigated by querying the DOM or using JS hooks. (Decide on whether we want to use JS hooks or query the DOM)
Other relevant information can be decided in the future or depending upon demand.

Jun 30 '22 08:06 Ice3man543

Scrape javascript / javascript files and collect links using regex.

This will only be slightly effective. There are a lot of JS scripts that include other JS scripts based on conditions, which means the full URL can only be learned after the DOM is rendered. Google analytics does this, for example.

A better strategy is to look at the "final" DOM and grab all JS links and info, this way they will all have been resolved and available to look at.

JS checks we should implement with a rendered DOM:

Subresource Integrity Failed Validation
Cross-Domain Script Includes where DNS Resolution Fails
Cross-Domain Script Includes where a file is not found
Malicious JS for magecart from Anomali and Trend (and others)

Aug 22 '22 12:08 sullo