ocaml-protoc
ocaml-protoc copied to clipboard
Roundtrip tests with QuickCheck
While working with ocaml-protoc, I encountered a message type that caused an RPC deserialization error when the client attempted to decode the server's response. This prompted me to explore an implementation of roundtrip property tests for ocaml-protoc.
In this PR, I've created a new test directory that adds roundtrip tests using QuickCheck. The property under test is the ability of a message to roundtrip through protoc serialization (using both protobuf and json). The quickcheck generation is focused on generating inputs of the message type under test.
This test was able to automagically find the problematic value I was dealing with.
This new test directory is structured as a separate test package and does not introduce new dependencies to the ocaml-protoc codebase. However, it does have a dependency footprint that might not align with your preferences. Therefore, I'm submitting this PR as a draft for your review, and start a discussion.
As for the original issue, I haven't investigated its root cause. I'm also unsure if the message type I'm trying to express is a valid proto specification. I would greatly appreciate your insights on this matter.
The message in question is the Unit
case of the UnitOrError
message type.
Thank you for your time and consideration!
Note: Upon further reflection, I realized that I had inadvertently conflated fuzz testing with QuickCheck property testing in my previous description. I've updated the description above to more accurately reflect this distinction.
Also, a note for @c-cube: I just came across qcheck, a QuickCheck
implementation that I see you have contributed to. I want to clarify that my
selection of a different QuickCheck library (base_quickcheck
) in this PR
was simply due to my prior experience with it, and not a reflection on
comparing both libraries. This is really just a draft I put together without
thinking much about all the details, to get some feedback on the approach
first. I am certainly open to exploring qcheck
too and would be interested
in incorporating it into the tests if you believe it would be a more beneficial
dependency for the project.
I would suggest going with AFL fuzzer, and a very good https://github.com/stedolan/crowbar library that supports it. I've had a lot of success finding obsure bugs in my code with this combo. AFL supports dictionary-based mode which saves some CPU cycles for coming up with protobuf syntax on its own (although it definitely can do that, it can even pull JPEGs out of thin air). It maximizes branch coverage in the tested process by producing new inputs, and it can also minimimize the test cases so that you have minimal set that has the same coverage.
I was thinking that we could also add some official protobuf implementation into the loop, like Golang one, and have ocaml-protoc encode a message, and Golang generated code decode it and vice versa, this way we can property-test compatibility with official Google tooling.
Hm, probably AFL is not really required for roundtrip testing that you suggest, it's powers are not really helpful here. Given some protobuf message some quick-check style fuzzing is more than sufficient to cover the input space...
We could use AFL/crowbar though to come up with a minimal collection of input .proto files that cover most of what ocaml-protoc can parse though. Filter out incorrect inputs with some official tool like buf, and feed the rest into ocaml-protoc. This will find cases where ocaml-protoc fails to parse valid .proto code. Maybe we could just use some already-created set of .proto files used as test corpus for other protobuf implementations for this purpose...
Having a decent collection of .proto files, we could come up with code generation plugin that will emit quickcheck code for all the generated types. Actually you can tell ocaml-protoc to add quickcheck ppx annotations to all types by its own, this way you don't have to make duplicates of types in your quickcheck code only to add the necessary ppxes. Having automatically generated code for quickcheck for all types, you can just generate the whole testing code for whole collection of .proto's and run it! This will ensure that whatever is encoded by ocaml-protoc is properly decoded back, and the only thing that would be missing is compatibility with other tooling.
Hi @Lupus !
I want to express my gratitude for your innovative thinking and consideration of compatibility with other frameworks. Your ideas have also guided my thoughts in valuable new directions. Thank you!
While I may not be able to fully articulate the essence of my feelings at this time, I find it important to distinguish between compile-time and runtime errors. The prophecy "If it compiles, it works" is well-known: I find myself more concerned about bugs where ocaml-protoc accepts a definition but generates suspicious code that doesn't roundtrip correctly. These runtime issues, whether due to a failure to reject an invalid proto definition or otherwise, are my primary focus. Compile-time issues, like a supposedly valid proto specification that fails to parse, provide us with ample time and notice to handle.
While I currently have reservations about the accuracy of proto files generated by a fuzzing framework in representing real-world specifications, I recognize this may be due to my own bias and lack of experience with such generation methods. I'm open to the possibility of unexpected outcomes, and I must admit, the idea of a solution that generates not only the roundtrip tests with their QuickCheck inputs, but also the type definitions from the start, is quite impressive. For now my current focus will be on real cases still.
Your suggestion about adding ppx deriving directly to the generated code has not gone unnoticed. I understand that injecting ppx deriving constructs could simplify user code. However, I have concerns about the need to overwrite a QuickCheck generator for a specific type, which could potentially undermine the whole approach. This scenario may not occur, but it's something I'm keeping in mind.
I greatly appreciate your long-term vision. If we were to think about near-term, actionable steps, while keeping your broader goals in mind, what would it look like? As a first step, I think building a facility to roundtrip proto specifications, whether they're manually collected, gathered from known datasets, or fuzzily generated, would be a immediate win.
To reduce the amount of code needed to wrap the generated code, we could introduce a type-class that encapsulates the functionality I've manually defined in user land. For instance:
module T = struct
type t = Messages.person
let encode_pb = Messages.encode_pb_person
let decode_pb = Messages.decode_pb_person
let encode_json = Messages.encode_json_person
let decode_json = Messages.decode_json_person
end
This binding could be automatically generated and made available as a type-class (perhaps a record) alongside each generated type. This could be done conditionally, based on a new flag for the protoc compiler. The newly generated value could then be directly used by the tests, thereby minimizing their footprint. This approach would streamline the testing process by reducing the amount of manual code required.
Thank you again for your valuable input and inspiring vision. I look forward to further discussions on these topics.
Latest Developments:
-
I've experimented with
qcheck
. At one point in the PR history, both libraries were operational side by side, and I confirmed thatqcheck
could identify the same problematic input. -
Subsequently, I removed the
base_quickcheck
components, hoping thatqcheck
's dependencies would be more compatible with the project and the CI. -
I've attempted to incorporate some elements from our discussion. I've included a type class with most of the bindings, but unfortunately, the most crucial one (the quickcheck generator
gen
) cannot be generated with ppx_deriving. The obstacle is that the--ocaml_all_types_ppx
option causes the generator to output the deriving construct in both the.ml
and the.mli
, butppx_deriving_quickcheck
doesn't seem to support signatures. To progress, I've relied on the existing structure and continued to create the generator from the user land. -
It's worth noting that I'm not familiar with qcheck and the unit test framework it's likely designed to work with. I don't believe we need
ppx_expect
here, which is primarily used to print out the discovered values. I've enabled the option in the PR to accept external contributions, so if anyone has time and interest, I'd welcome a few commits to convert this into a standard test that best suits the project (perhaps alcotest?).
I'm eager to discuss whether you believe this PR, in its current state, is converging towards something that could be merged. It has already identified some useful problematic values, and the new directory can serve as a repository for new tests as we progress.
For further improvements:
- Contribute to ppx_deriving_qcheck to make it compatible with
--ocaml_all_types_ppx
(support for mli seems useful forppx_deriving_check
as well), or - Introduce a new option in
ocaml-protoc
to inject ppx in*.ml
files only. This would be a more immediate solution that doesn't necessitate upstream changes.
While both options are viable in the long term, I'm questioning whether it's worth pursuing option 2 in this PR.
A few comments off the top of my head:
Thank you for your work, it is very much appreciated. Your thoughtfulness in comments is also noted :-).
The further improvement 1. would be nice, the ppx for qcheck should work for files with .mli in general. It should be too hard as it's a signature per type. I don't think 2. is useful because then it means we can't access the printers from the outside.
QCheck can be used with alcotest or ounit, but it also has its own runner library in qcheck-core.runner. It just takes a list of tests and CLI arguments, and it might be enough.
I don't think 2. is useful because then it means we can't access the printers from the outside.
I believe you're referring to the qcheck generators, correct? Just to make sure we're on the same page, the idea that would make option 2 viable in this case is that the generator would be exported via the type_class that the mli already exports. For instance, see in the messages.mli:
(** {2 QuickCheck} *)
val quickcheck_person : person Pbrt_quickcheck.Type_class.t
(** [quickcheck_person] provides helpers to test the type person with quickcheck *)
This would be accompanied by a modification to the type_class type to include the missing field:
------ /ocaml-protoc/src/runtime-qcheck/pbrt_quickcheck.ml
++++++ /ocaml-protoc/src/runtime-qcheck/pbrt_quickcheck.ml
@|-1,13 +1,14 ============================================================
|(** Runtime for QuickCheck based tests. *)
|
|(** A type class generated for each type *)
|module Type_class = struct
| type 'a t = {
| pp: Format.formatter -> 'a -> unit;
+| gen: 'a QCheck2.Gen.t
| equal: 'a -> 'a -> bool;
| encode_pb: 'a -> Pbrt.Encoder.t -> unit;
| decode_pb: Pbrt.Decoder.t -> 'a;
| encode_json: 'a -> Yojson.Basic.t;
| decode_json: Yojson.Basic.t -> 'a;
| }
|end
QCheck can be used with alcotest or ounit, but it also has its own runner library in qcheck-core.runner. It just takes a list of tests and CLI arguments, and it might be enough.
Thanks for this. I plan to try this when I next work on the PR.
I decided to take one more step while the details are still fresh in my mind. I've modified the test to use qcheck's core runner, as you suggested. Here's the outcome:
$ dune exec ./src/tests/roundtrip/main.exe -- --verbose --colors
random seed: 130848014
generated error fail pass / total time test name
[✓] 100 0 0 100 / 100 0.0s error.protobuf-roundtrip
[✓] 100 0 0 100 / 100 0.0s error.json-roundtrip
[✓] 100 0 0 100 / 100 1.5s person.protobuf-roundtrip
[✓] 100 0 0 100 / 100 1.5s person.json-roundtrip
[✓] 100 0 0 100 / 100 0.0s empty.protobuf-roundtrip
[✓] 100 0 0 100 / 100 0.0s empty.json-roundtrip
[✓] 100 0 0 100 / 100 0.0s error.protobuf-roundtrip
[✓] 100 0 0 100 / 100 0.0s error.json-roundtrip
[✗] 1 1 0 0 / 100 0.0s unit_or_error.protobuf-roundtrip
[✓] 100 0 0 100 / 100 0.0s unit_or_error.json-roundtrip
=== Error ======================================================================
Test unit_or_error.protobuf-roundtrip errored on (0 shrink steps):
Unit
exception Pbrt.Decoder.Failure(Malformed_variant("unit_or_error"))
================================================================================
failure (0 tests failed, 1 tests errored, ran 10 tests)
Currently, I haven't found a way to incorporate this failing test in a manner that doesn't disrupt the build, without resorting to cram tests, which I believe aren't enabled in this repository. This is something to be determined.
Regarding the issue with qcheck2 in *.mli
: This draft employs a rather crude workaround of filtering out qcheck2 if the generator detects that it's generating an mli. While I don't recommend this for the live version, it allows me to demonstrate progress on the PR.
I believe this latest round of changes brings us significantly closer to the level of automation discussed by @Lupus.
Don't we want to fix the bug just after merging this anyway? I've never need a failing proptest before.
Do you not use proptests in TDD?
Long-term, users submitting failing tests [^1] in PRs could be a valuable workflow. This lets maintainers prioritize asynchronously, and use pre-existing expect-tests to see the impact of their changes when they get to it. The effect of fixes on these tests also improves the code review experience, and surviving tests become effective regression tests [^2].
In the near term, we could pragmatically just disable the only failing item just before merging.
Regarding the bug, I want to manage expectations: I am not currently investigating the root cause. My focus is on making it easier to use ocaml-protoc through my interfacing work on ocaml-grpc, and discussing testing aspects of ocaml-protoc, as I am doing in this PR. I would prefer to avoid delving into the actual runtime for now, if that makes sense.
For the qcheck part, I've opened a PR to add the missing support for signature to move this PR forward.
[^1]: A test that demonstrates an issue in a reproducible fashion, but doesn't affect the succesful exit code of the overall build process. [^2]: A similar approach seemed to work well in the ocp-ident project.
I don't necessarily use TDD. I like the regression tests part, but I've never before used a failing qcheck property as a positive test (if you make it work, why not though!). Pragmatically we can comment/disable the failing test and re-enable it once the bug is fixed, that works for me.
I've added the following:
-
The ability to override the QuickCheck generator during test construction.
-
The option to disable certain tests from a generated test suite, providing flexibility for specific use cases.
if you make it work, why not though!
Thanks! I've made a new attempt:
I've observed that the qcheck-runner, when executed without the --verbose
flag, only outputs to stdout. It also provides a --seed
option for setting a
fixed seed for random generation. Leveraging these features, I've managed to run
the runner stably, enabling the capture of its output in a dune expect test.
This looks like this:
(rule
(target test.outputs)
(action
(with-accepted-exit-codes
(or 0 1)
(with-outputs-to
%{target}
(run ./%{dep:main.exe} --no-colors --seed 382079347)))))
To use the with-accepted-exit-codes
construct, I had to upgrade the dune-lang
version to 2.2
. I trust this won't pose any issues?
I'm pleased to report that the CI is now passing. Looking forward to your next round of feedback!
That looks good! Dune 2.2 is a reasonable dependency. Do you plan to add more tests, or do you want to mark this PR as ready?
Thank you, @c-cube, for your continued involvement with this PR. I'm OK with the current tests for this PR, and am envisioning a growing collection of tests contributed from users once the directory is merged.
I've published a preview package^1 and listed it^2 for experimentation. I've also integrated it into one of my projects^3.
@Lupus, I'd love to hear your thoughts too before transitioning this PR to the review stage. I'm particularly interested in hearing about applications in a broader environment. Would you consider testing this in one of your protoc interfaces? Thank you!
Hey @mbarbin , sorry for late response. I doubt I'll have time to test this with our proto interfaces. We don't have a large collection of protos as of now anyways, so I doubt that it will bring much value. There is that Google's test suite for protobuf, have you tried testing those?