feat: update URL instead of creating new records in mongoDB
For example I have runned the katana command
And katana does not give the status code (because it try to follow redirect and it's circular so it can't).
But this is not the problem, imagine that then I run the following httpx on every URL that was found by katana
Then I will have 4 records inside my database, 2 records created for katana and 2 records by httpx
I know there is a field that store the _source of the tool that found the URL, but does this not cause a problem?
Maybe nothing to do here, maybe I am just talking about a wanted behavior.
Maybe the most problematic is when running the same tool, twice the new records get also added to the database
I get the first 4 and then 4 new again
This is mostly by design, updating a previous record would require going through the whole list of URLs in MongoDB, which would be too heavy when the number of records grow.
The vision is more to have use the implemented de-duplication of findings in MongoDB in the target workspace instead, so you can query URLs with {"_type": "url", "_context.workspace_id": <WORSPACE_ID, "_context.workspace_duplicate": false} and you would get de-deduped results from the workspace.
Ideally katana should run httpx internally to enrich results using the Go library so that we don't need to run the latter at all.