httpx
httpx copied to clipboard
improve fqdn extraction from response body using parsers
Please describe your feature request:
- currently we use regex to extract potential domains and then apply some heuristic rules
- this can furthur be improved by using actual parsers
- html -> goquery
- javascript -> goja ast parser (https://github.com/dop251/goja/tree/master/parser)
- parsers allow us to filter out and locate contexts and then extract fqdns from those places ( ex: src , href attributes of html , strings literals of javscript etc)
Describe the use case of this feature:
- reduced FP and FN