zonemaster-engine icon indicating copy to clipboard operation
zonemaster-engine copied to clipboard

sort generated data files for idempotence

Open MichaelTimbert opened this issue 1 year ago • 10 comments

Purpose

This PR improve idempotency of generating data files by sorting entries in *.data files and ensuring JSON dump are also sorted. The goal is to reduce the mess of deletions and insertions from git diff when t/*.data files are modified.

Context

Fixes #1265

How to test this PR

run ZONEMASTER_RECORD=1 make test to save new formated *.data file.

  • "t/*.data" file must be sorted
  • each "Zonemaster::LDNS::Packet" JSON dump must also be sorted
  • each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

Notes

To improve visibility of change we can also activate indentation of JSON dump with $json->indent( 1 );. This makes it easier to read the differences between JSON dump with vimdiff or git diff but makes *.data files less readable/compact.

MichaelTimbert avatar May 31 '24 12:05 MichaelTimbert

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AAC' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

(...) but makes *.data files less readable/compact.

What are the backsides of that?

matsduf avatar May 31 '24 12:05 matsduf

Please give a more descriptive headline of the PR.

matsduf avatar May 31 '24 12:05 matsduf

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

MichaelTimbert avatar Jun 03 '24 07:06 MichaelTimbert

* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.

What benefit does that give?

To ensure the reproducibility of *.data files when doing a ZONEMASTER_RECORD=1 make test. Otherwise, each time a new identifier will be used and the packet will be considered to have changed even if the content of the request is identical.

MichaelTimbert avatar Jun 03 '24 08:06 MichaelTimbert

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

matsduf avatar Jun 03 '24 09:06 matsduf

(...) but makes *.data files less readable/compact.

What are the backsides of that?

*.data files will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.

I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.

This is something i proposed, it is not implemented in this PR. (i have edited the PR's notes)

MichaelTimbert avatar Jun 03 '24 16:06 MichaelTimbert

With this approach we gain idempotence in the sense that the saved file is the same every time (provided that no code or configuration has changed). But we also lose idempotence in the sense that the original analysis sees the real IDs, but when we analyze a restored cache the analysis sees zeroed out IDs. In this light wouldn't it be more proper to clear the ID at response reception? I.e. before analysis and insertion into the cache.

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

mattias-p avatar Jun 03 '24 18:06 mattias-p

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

matsduf avatar Jun 03 '24 22:06 matsduf

I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.

The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.

Yes, it wouldn't make much sense to clear the response's Message ID before it's matched to the request's Message ID. The ldns library takes care of this matching for us and just returns the correct response matching our request.

What I'm suggesting is that we clear the ID field in Zonemaster Engine or perhaps in Zonemaster LDNS after ldns returns it but before we add it to the cache.

mattias-p avatar Jun 05 '24 08:06 mattias-p

I suggest that we discuss this at the F2F.

matsduf avatar Sep 04 '24 14:09 matsduf

@MichaelTimbert, please merge.

matsduf avatar Nov 11 '24 18:11 matsduf

Release testing v2024.2: no issues found.

tgreenx avatar Dec 04 '24 18:12 tgreenx