secator icon indicating copy to clipboard operation
secator copied to clipboard

feat: update URL instead of creating new records in mongoDB

Open romisfrag opened this issue 1 year ago • 2 comments

For example I have runned the katana command image

And katana does not give the status code (because it try to follow redirect and it's circular so it can't). But this is not the problem, imagine that then I run the following httpx on every URL that was found by katana image Then I will have 4 records inside my database, 2 records created for katana and 2 records by httpx

I know there is a field that store the _source of the tool that found the URL, but does this not cause a problem?

Maybe nothing to do here, maybe I am just talking about a wanted behavior.

romisfrag avatar Sep 13 '24 21:09 romisfrag

Maybe the most problematic is when running the same tool, twice the new records get also added to the database image I get the first 4 and then 4 new again

romisfrag avatar Sep 13 '24 21:09 romisfrag

This is mostly by design, updating a previous record would require going through the whole list of URLs in MongoDB, which would be too heavy when the number of records grow.

The vision is more to have use the implemented de-duplication of findings in MongoDB in the target workspace instead, so you can query URLs with {"_type": "url", "_context.workspace_id": <WORSPACE_ID, "_context.workspace_duplicate": false} and you would get de-deduped results from the workspace.

Ideally katana should run httpx internally to enrich results using the Go library so that we don't need to run the latter at all.

ocervell avatar Sep 13 '24 21:09 ocervell