prost icon indicating copy to clipboard operation
prost copied to clipboard

Add ability to detect unknown fields

Open akhilles opened this issue 4 years ago • 6 comments

This is useful when handling untrusted inputs. E.g. a client could pack extra fields into messages in order to increase server load. Enforcing a max size limit can help, but it wouldn't address the more subtle cases.

This also allows applications to opt out of forward compatibility and having to deal with unknown fields downstream.

Related to https://github.com/danburkert/prost/issues/2.

akhilles avatar Mar 30 '21 16:03 akhilles

What do you expect prost to do here, beyond what's outlined in #2? Is the idea that unknown fields should fail the decoding? That's a non-starter, it would hamper forwards compatibility guarantees.

danburkert avatar Mar 30 '21 18:03 danburkert

wouldn't address the more subtle cases

What cases?

danburkert avatar Mar 30 '21 18:03 danburkert

What do you expect prost to do here, beyond what's outlined in #2? Is the idea that unknown fields should fail the decoding? That's a non-starter, it would hamper forwards compatibility guarantees.

Couple options that would work without introducing breaking changes:

  • Add a decode_strict function that fails if there are unknown fields.
  • Add a contains_unknown_fields function that returns true if there are unknown fields.
  • Add a decode_unknown_fields function that only decodes unknown fields into some type of map data structure. This would also address #2.
  • Add a KNOWN_TAGS associated constant to Message or a function that returns a &'static [usize] containing all known tags. This might be enough to implement contains_unknown_fields and decode_unknown_fields outside of prost.

What cases?

The client still adds a bunch of fields, but not enough to trigger the max size limit.

akhilles avatar Mar 30 '21 19:03 akhilles

Protecting against untrusted clients needs to be handled at a layer outside of prost. prost only decodes messages from byte buffers, so if you are calling decode the untrusted and potentially too-big input has already been pulled into memory, and any damage it would do, like an OOM scenario, has already been done. This is why e.g. gRPC allows a server to set a max message size, it should enforce that max message size without requiring resources to be allocated for a message exceeding the size.

danburkert avatar Mar 30 '21 20:03 danburkert

Sounds like you are hinting at a second scenario where the server is running an older version of your software, and you want to protect against a client sending a critical new field, and the server silently dropping that. In effect, how to handle adding a new field to an existing message and ensuring it's respected on the server, regardless of version skew. The answer to this is not rejecting all unknown fields, but instead something along the lines of protocol versioning or feature flag negotiation. Again, this is something that would live outside of prost.

danburkert avatar Mar 30 '21 20:03 danburkert

I've just started using prost to decode messages coming from a Kafka topic. My use case is not so much about untrusted sources, but schema evolution.

For example, a producer sends a message with proto schema version 2. Later on, a consumer gets the message but is still using schema version 1. The schema is forward compatible, meaning the only change is a new optional field in v2.

Reading this issue, it sounds like prost will fail because the buffer is too big? But according to the specs this schema evolution should simply lead to the new field being ignored: https://developers.google.com/protocol-buffers/docs/overview#updating

I would expect either:

  • a silent ignore like the specs say (maybe as a config parameter somewhere for backward compatibility?)
  • some kind of extra information that could be inspected and is harmless. For example, protofish goes with this option and puts extra fields in a garbage field: https://docs.rs/protofish/0.3.1/protofish/decode/struct.MessageValue.html

cortopy avatar Apr 15 '21 12:04 cortopy