etcd icon indicating copy to clipboard operation
etcd copied to clipboard

Ensure that input validation between API and Apply is consistent

Open serathius opened this issue 3 years ago • 2 comments

Recent fixes for bugs detected by fuzzing has resulted in improved validation for Apply, for example https://github.com/etcd-io/etcd/pull/13602. Errors during Apply result in panic as raft is unable to progress. Fuzz tests detect those panics within apply function, however this not the only place that fix should be fixed.

Detecting invalid raft message during apply is very late stage. Such messages should be rejected on much earlier stage. Recent fixes has showed that we have no good mechanism to ensure consistency validating messages.

Goal: Ensure that API rejects all requests that would be rejected during apply or caused panic.

Ideas for implementation:

  • We could write a fuzzer test that will generate a lot of randomly filled structs representing API rpc. With rpcs we can push them through both validators and check if they result any inconsistencies.

cc @ptabor

serathius avatar Jan 17 '22 16:01 serathius

Hi @serathius, Is this issue available for being picked up? I have experience in fuzzing and I'm looking for some contributions GoLang. Are these the RPC API: https://github.com/etcd-io/etcd/blob/main/server/etcdserver/api/ to fuzz?

samueleresca avatar Sep 28 '22 22:09 samueleresca

@samueleresca Yes feel free to pick those issues up.

Are these the RPC API: https://github.com/etcd-io/etcd/blob/main/server/etcdserver/api/ to fuzz?

This directory contains all Etcd APIs and some unrelated code (candidate for cleanup). For fuzzying we would like to focus on V3 grpc API in https://github.com/etcd-io/etcd/tree/main/server/etcdserver/api/v3rpc.

Problem that we want to solve is inconsistency in request validation between different stages of processing request. First when request comes to etcdserver, second request is applied in all members of the cluster. In both cases request is validated, however if those two validations is inconsistent it might case whole etcd cluster to panic.

For example Range request is:

  • first validated in API https://github.com/etcd-io/etcd/blob/cd9764a99fab81788d6234d3524a08c9b893c3e4/server/etcdserver/api/v3rpc/key.go#L114-L128
  • and later validated when applying the change https://github.com/etcd-io/etcd/blob/cd9764a99fab81788d6234d3524a08c9b893c3e4/server/etcdserver/txn/txn.go#L179-L194 (will panic if sort target is invalid)

Goal of the task is ensure that every request that passes etcd validation in API, will not cause the apply code to panic. I proposed to setup a fuzz test that will generate a requests which can be then passed to API validation, if it passed then it should be apply able. Test should fail if there is a request that passes validation but causes panic in apply stage.

serathius avatar Sep 29 '22 08:09 serathius

@serathius I opened a draft PR with an example on the RangeRequest - let me know if it makes sense so I can continue with the other input validation.

samueleresca avatar Oct 08 '22 08:10 samueleresca

Thanks, reviewed.

serathius avatar Oct 10 '22 07:10 serathius