katana
katana copied to clipboard
Create analyzer for scraping new navigation from headless page states
- Anchor, Button, Embed, and Iframe for direct links.
- Parse and fill HTML Forms as well optionally. (Login, Register, etc using these methods)
- Scrape javascript / javascript files and collect links using regex.
- Collect requests made by XHR/Javascript APIs as well.
- Elements having event listeners can be navigated by querying the DOM or using JS hooks. (Decide on whether we want to use JS hooks or query the DOM)
- Other relevant information can be decided in the future or depending upon demand.
- Scrape javascript / javascript files and collect links using regex.
This will only be slightly effective. There are a lot of JS scripts that include other JS scripts based on conditions, which means the full URL can only be learned after the DOM is rendered. Google analytics does this, for example.
A better strategy is to look at the "final" DOM and grab all JS links and info, this way they will all have been resolved and available to look at.
JS checks we should implement with a rendered DOM:
- Subresource Integrity Failed Validation
- Cross-Domain Script Includes where DNS Resolution Fails
- Cross-Domain Script Includes where a file is not found
- Malicious JS for magecart from Anomali and Trend (and others)