defradb
defradb copied to clipboard
Introduce proactive test suite
Proactive tests
At the moment all our tests are reactive, they are written by devs guessing at ways in which the database can fail. We should add some fancy magic to generate tests in order to seek out bugs, kind of like an advanced form of fuzz testing.
At least at first, it will be limited to comparing results between two branches (e.g. a refactor branch, and develop) - this would largely remove the need to predict (or interact with) the results of actions.
Project structure
The project is split into four distinct parts; firstly there is the set of supported action types, second the general purpose generator(s), third is the engine responsible for orchestrating everything related to generating tests, fourth is the test executor(s) responsible for executing generated tests.
Actions
Each supported action type will implement an common interface with two functions.
The first function (called Entropy
for now) takes in a set of parameters (potentially an actual []any
) (the first param would likely be the DefraDB instance, to allow accounting for existing state) and returns the number of possible valid and relevant action-values that could be generated.
For example, given a AddSchema
action-type, and assuming an empty database for simplicity, there would be two parameters - the number of schema to generate (c
), and, the maximum number of fields per schema (n
). The number of potential scalar field types is a constant (NFTypes
). Field names are not relevant/low-value to our testing and can be ignored as a variable, and we will not worry about directives yet. This gives us the function n ^ (NFTypes+2c)
(2c
represents possible relation fields, one for the object and one for the id field).
This entropy value allows the generator to encode any possible combination of exclusively useful valid values into binary format without needing to know the specifics of the action - as long as the generated code is equal to or smaller than the entropy, it will be valid and unique (uniqueness being desirable to avoid unwanted bias in action generation).
In some cases entropy may become unreasonably large (e.g. generating docs containing lots of large fields), in which case the value returned from Entropy
would need to represent a sampling of all potentially useful test values, not the actual true entropy.
The second function (called ConstructValue
for now) takes the encoded value generated by the generator (using Entropy
) and decodes it into a valid test action.
The encoding would likely something similar to a BaseX numeric system, where each digit is treated as a bitmap item. The X in BaseX is variable depending on the location (e.g. when representing a field type on a single collection it would currently be Base16 (NFTypes
)). The X does not need to be an even number and can be as large as we like.
It is likely that encoding/decoding values will require us to support mathematical operations on arbitrarily large integers. There appears to be no official Go packages for this, so we may have to write our own. If we do this, we should bear in mind the potential usefulness of such a package to production DefraDB and build accordingly.
Note
It would I think be theoretically be possible to generate the entropy for the entire test case given a set of initial limiting params, however that seems like it would be an unreasonably large value and it would make encoding/decoding quite a lot more complicated. I might be wrong however and it would be amazing if we could do that.
Generators
By removing correctness and bias as problems from value generation, selecting values becomes much simpler and the code should abstracted away allowing multiple generators to be implemented and swapped out.
Initially we can likely get away with using a simple random []byte
generator, however long term we can experiment with replacing this with more fancy generators such as neural networks - potentially allowing us to predict actions most likely to catch bugs.
Engine
The action generation engine would coordinate the active generator, and test actions. It will likely take very few (if any) parameters, other than perhaps the action-types and generators to use, and output test action sets.
It should not be responsible for executing the actions - by splitting test generation and execution we allow the test generation aspect to be used for multiple means, initially that would just mean refactoring tests, but in the future it could include benchmarking, the change detector and other interesting cases.
Executors
The generated tests will need to be executed in order to be useful, initially I suggest we focus on comparative testing between branches to ensure that existing behaviour is preserved.
It is important that the tests executed are recorded somewhere (initially just in the terminal perhaps) so that they can be replayed and examined outside of this action-generation system - otherwise the usefulness of the system would be greatly reduced.
It seems likely that building the first executor will involve extracting shared stuff from the core integration test suite so that they may be shared between them. This will likely also be of benefit to the benchmark suite.
Warning
I am uncertain as to how useful this will end up being, it could prove to be too unreliable or costly to run, but it could also be amazingly useful to both the core DefraDB team and other use cases (Source and non-Source). The long term scope of the project is fairly unbounded, and it will likely be very fun to work on. We should be cautious in how much time we invest in this.
Useful links
- https://www.cuemath.com/numbers/binary-multiplication/
- https://stackoverflow.com/questions/30025185/karatsuba-multiplication-optimization-on-byte-array?rq=4
- https://github.com/IBM/graphql-query-generator