parsedmarc icon indicating copy to clipboard operation
parsedmarc copied to clipboard

[HELP NEEDED] Cleaning database from some domains

Open davidande opened this issue 1 year ago • 3 comments
trafficstars

Hello, I use Parsedmarc for a long time and today I log almost 200 different domain names. for some reason, I do not need to log some domains anymore and I would like to clean the database from these domains. Does anyome could help me for this? is there any elastic command that could clean the base from all entries generated by one specific domain? Thanks for Your help

David

davidande avatar May 17 '24 08:05 davidande

Hi @davidande ,

Which data storage solution do you use?

Szasza avatar May 30 '24 10:05 Szasza

Hi @Szasza My database is stored in Elasticsearch 8.13.0

davidande avatar Aug 09 '24 11:08 davidande

@davidande you will need to write a script which lists all the indices in your ES installation, and call the Delete by query API with the following:

POST /<index_name>/_delete_by_query
{
  "query": {
    "match": {
      "header_from": "<domain_here>"
    }
  }
}

Please note the following:

  • The reason why you have to iterate through the indices is that parsedmarc stores the reports in separate indices based on the date of the reports.
  • The aggregate reports' indices will have dmarc_aggregate in their names, while forensic reports will have dmarc_forensic.
  • The above query checks the header_from field which is only present in aggregate reports, but not in forensic reports.
  • You may want to check for the envelope_from field as well (aggregate), also maybe envelope_to (aggregate), either separately, or as part of a composite query. Not sure what your deletion criteria "all entries generated by one specific domain" means.
  • If you need to delete forensic reports too, then you also need to run a deletion on the forensic indexes, matching the domain field, also maybe the dkim_domain one. Again, depends on your use case.

I hope the above helps.

Szasza avatar Sep 22 '24 10:09 Szasza