sort generated data files for idempotence
Purpose
This PR improve idempotency of generating data files by sorting entries in *.data files and ensuring JSON dump are also sorted.
The goal is to reduce the mess of deletions and insertions from git diff when t/*.data files are modified.
Context
Fixes #1265
How to test this PR
run ZONEMASTER_RECORD=1 make test to save new formated *.data file.
- "t/*.data" file must be sorted
- each "Zonemaster::LDNS::Packet" JSON dump must also be sorted
- each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.
Notes
To improve visibility of change we can also activate indentation of JSON dump with $json->indent( 1 );.
This makes it easier to read the differences between JSON dump with vimdiff or git diff but makes *.data files less readable/compact.
* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AAC' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.
What benefit does that give?
(...) but makes
*.datafiles less readable/compact.
What are the backsides of that?
Please give a more descriptive headline of the PR.
(...) but makes
*.datafiles less readable/compact.What are the backsides of that?
*.data files will no longer resemble log files where there is only one entry per line.
The main drawback is that we have to rewrite the *.data loading code.
After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.
* each field "data" of ""Zonemaster::LDNS::Packet" must start with 'AA' to reflect the setting of ID field to 0, QR bit is set to 1 and MSB of OPCODE is 0.What benefit does that give?
To ensure the reproducibility of *.data files when doing a ZONEMASTER_RECORD=1 make test.
Otherwise, each time a new identifier will be used and the packet will be considered to have changed even if the content of the request is identical.
(...) but makes
*.datafiles less readable/compact.What are the backsides of that?
*.datafiles will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.
I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.
(...) but makes
*.datafiles less readable/compact.What are the backsides of that?
*.datafiles will no longer resemble log files where there is only one entry per line. The main drawback is that we have to rewrite the *.data loading code. After reflection, I think this is not a good idea. If we need to compare different versions of *.data files we can always make a small script for that.I cannot remember that I have ever compare data files. Unless someone has pointed out that he/she really wants it I think it is better not to change. I assume you will remove that change.
This is something i proposed, it is not implemented in this PR. (i have edited the PR's notes)
With this approach we gain idempotence in the sense that the saved file is the same every time (provided that no code or configuration has changed). But we also lose idempotence in the sense that the original analysis sees the real IDs, but when we analyze a restored cache the analysis sees zeroed out IDs. In this light wouldn't it be more proper to clear the ID at response reception? I.e. before analysis and insertion into the cache.
I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.
I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.
The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.
I don't think we're ever looking at the ID fields in our built-in test modules, but since we want to support custom test modules, I believe it could make sense to have idempotence in both these ways.
The thing about Message ID is that if the response does not have the same Message ID as the query it is not a response to that query and should be ignored or match with another query.
Yes, it wouldn't make much sense to clear the response's Message ID before it's matched to the request's Message ID. The ldns library takes care of this matching for us and just returns the correct response matching our request.
What I'm suggesting is that we clear the ID field in Zonemaster Engine or perhaps in Zonemaster LDNS after ldns returns it but before we add it to the cache.
I suggest that we discuss this at the F2F.
@MichaelTimbert, please merge.
Release testing v2024.2: no issues found.