httpx icon indicating copy to clipboard operation
httpx copied to clipboard

Make -j use only the flags passed for a custom json output:

Open acidvegas opened this issue 1 year ago • 7 comments

Currently, using -j for json you get a specific struct of data that can not be customized. I would be nice to see if respect all the flags given and only include data based on those flags for json data.

acidvegas avatar Dec 22 '23 18:12 acidvegas

@acidvegas, An example would be great. Thanks!

dogancanbakir avatar Dec 25 '23 10:12 dogancanbakir

lol....Is this not obvious enough the way it was written...?

09:16:03 acidvegas@blackhole ~ : echo "supernets.org" | httpx -j

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

                projectdiscovery.io

[INF] Current httpx version v1.3.7 (latest)
{"timestamp":"2023-12-26T21:16:16.705942394-05:00","hash":{"body_md5":"4ae9394eb98233b482508cbda3b33a66","body_mmh3":"-4111954","body_sha256":"89e06e8374353469c65adb227b158b265641b424fba7ddb2c67eef0c4c1280d3","body_simhash":"9814303593401624250","header_md5":"4cfadf5f1f1978f0fdef47cc1bdd1d7e","header_mmh3":"-695244662","header_sha256":"b27c984facd2d2cf5af552c35d04358cfe67654af8ac7e56ba35fb6b2463ed7d","header_simhash":"10962523587435278190"},"port":"443","url":"https://supernets.org","input":"supernets.org","title":"SuperNETs","scheme":"https","webserver":"nginx","content_type":"text/html","method":"GET","host":"51.89.151.158","path":"/","time":"595.139946ms","a":["51.89.151.158","2001:41d0:801:2000::5ce9"],"words":436,"lines":79,"status_code":200,"content_length":4597,"failed":false,"knowledgebase":{"PageType":"nonerror","pHash":0}}
09:16:16 acidvegas@blackhole ~ : echo "supernets.org" | httpx -title -ip -j

    __    __  __       _  __
   / /_  / /_/ /_____ | |/ /
  / __ \/ __/ __/ __ \|   /
 / / / / /_/ /_/ /_/ /   |
/_/ /_/\__/\__/ .___/_/|_|
             /_/

                projectdiscovery.io

[INF] Current httpx version v1.3.7 (latest)
{"timestamp":"2023-12-26T21:16:39.09376234-05:00","hash":{"body_md5":"4ae9394eb98233b482508cbda3b33a66","body_mmh3":"-4111954","body_sha256":"89e06e8374353469c65adb227b158b265641b424fba7ddb2c67eef0c4c1280d3","body_simhash":"9814303593401624250","header_md5":"fd96bc9b49442b1e6fa7eed2f39cc6b5","header_mmh3":"1014721133","header_sha256":"58df28fdf6e0abdfe273c785940acf42a18820b254c58d43bac5fa846ed2d5eb","header_simhash":"10962523587435276142"},"port":"443","url":"https://supernets.org","input":"supernets.org","title":"SuperNETs","scheme":"https","webserver":"nginx","content_type":"text/html","method":"GET","host":"51.89.151.158","path":"/","time":"529.102556ms","a":["51.89.151.158","2001:41d0:801:2000::5ce9"],"words":436,"lines":79,"status_code":200,"content_length":4597,"failed":false,"knowledgebase":{"PageType":"nonerror","pHash":0}}
09:16:39 acidvegas@blackhole ~ :

Same data no matter what flags you put....AKA you can't customize the JSON output..... json is the only real useful output HTTPX has and its completely lacking any customization. I dont care to calculate the body hash of websites when I am scanning 100000's of websites, but I have no control to turn that off if we use JSON output.

acidvegas avatar Dec 27 '23 02:12 acidvegas

@acidvegas CLI flags are unrelated to the json structure, it's a design choice to spit out all the data when json is requested in order to avoid multiple marshaling/unmarshaling and maps allocations. The only flags that we decided to include are for example those involving very big fields such as request/response as they might produce a big amount of data. You can easily customize the json output with tools like jq

Mzack9999 avatar Dec 27 '23 10:12 Mzack9999

@Mzack9999 this JSON dooesnt even include favicon hashes or the data from -td (tech-detect) even if you pass it...and jq? Im sorry but that is a fuckton of post-processing if your doing a large scan like millions of domains....You know how long body hashing adds to scan time? Esp if you dont even need the body hash.

Just saying, HTTPX is like the king of web scanning right now but it's missing a very core element in allow users to DO SOMETHING with the data, aka a proper JSON dict. I really cant see how it's practical to just pass flags to show content in the terminal but have completely unrelated content showing up in the JSON data.....

acidvegas avatar Dec 27 '23 16:12 acidvegas

I'm unable to reproduce the behavior you mentioned with -td:

$ echo projectdiscovery.io -td -json
...
"tech": [
    "Cloudflare",
    "HSTS"
  ],

Could you provide the full command line if it's different from the one I provided? The favicon is by default not included unless specified, as it adds more requests made to the server in order to locate the favicon file, and calculate the correct hash. Generally the json includes all the info that do not require additional network activity and have negligible overhead. All other info that needs to perform further requests are optional.

Initially the tool didn't support json format and it was thought as a generic http toolkit to perform rapid checks on specific data (example: title, status code, etc). Later on json was added as there was the need for many users to dump all the info to file, and just show on the command line specific ones for example to grep. Unfortunately the one requested is a breaking change as the current expected behavior is to have everything within the json. I fully agree with you that jq is not very syntax friendly for some complex elaboration. The closest thing I can think of is implementing another tool that could be easily plugged to httpx that can apply json manipulation based on configuration files. For example (assuming the tool is jqx) you have a configuration file like:

exclude:
  - title
  - status-code

and it's invoked like:

httpx -l list.txt | jqx

What do you think about it?

Mzack9999 avatar Dec 27 '23 20:12 Mzack9999

@acidvegas I fully agree with you, the same thing I need to scan at scale for more of the 1Milon domain and that's really weird how httpx with just -json flag will dump a lot of useless data for me the major problem it will calculate a lot of hashs which take a lot of extra time!

localhost-MouhannadlrX avatar Jan 03 '24 11:01 localhost-MouhannadlrX

It's ridiculous lol. Look at this:

"hash": {
    "body_md5":"4ae9394eb98233b482508cbda3b33a66",
    "body_mmh3":"-4111954",
    "body_sha256":"89e06e8374353469c65adb227b158b265641b424fba7ddb2c67eef0c4c1280d3",
    "body_simhash":"9814303593401624250",
    "header_md5":"980366deb2b2fb5df2ad861fc63e79ce",
    "header_mmh3":"-813072798",
    "header_sha256":"39aea75ad548e38b635421861641ad1919ed3b103b17a33c41e7ad46516f736d",
    "header_simhash":"10962523587435277678"
},

why do I need 8 fucking hash types when I didn't even supply an argument to collect on body hashes? jq and jqx is not a solution that literally just adds even more time to the post-process now...

acidvegas avatar Jan 14 '24 18:01 acidvegas

hashes are now excluded from default jsonl output - https://github.com/projectdiscovery/httpx/releases/tag/v1.6.0

ehsandeep avatar Mar 06 '24 19:03 ehsandeep