risingwave
risingwave copied to clipboard
discussion(connector): persist avro/proto definition in meta
Do we still choose to fetch the descriptor from registry every time we create a new formatter instead of persisting it in meta when creating the sink?
Originally posted by @wenym1 in https://github.com/risingwavelabs/risingwave/pull/12858#discussion_r1366954735
Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?
Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?
It's also for consistency and stability. If we don't persisting the metadata in meta, if the value is changed in the external system, RW may be implicitly affected.
Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?
It's also for consistency and stability. If we don't persisting the metadata in meta, if the value is changed in the external system, RW may be implicitly affected.
Yes, but maybe we just need to persist the version number instead of the whole descriptor set.
Yes, but maybe we just need to persist the version number instead of the whole descriptor set.
I was assuming this to be the pre-req of #11800/#14056/#14057 but it actually went to another direction that allows to alter to a completely new URL/subject rather than bumping the version number.
Also some related questions to answer:
- What if not using schema registry?
- There is no version number and we would have to persist the definition itself.
- Leave it unhandled because non-registry already has other limitations as well.
- What about other sinks?
- For example, doris sink also queries downstream for schema before writing to it. Are we also worried it would change implicitly and would like to handle them similarly?
Strong +1 for this. It would be very counter-intuitive if RisingWave restarts and fails because some schema definition URL is expired/invalid.
By the way, correspondingly, we shall provide a command ALTER TABLE/SOURCE ... REFRESH SCHEMA so that users can update the schema easily.
Related #15025. After that, REFRESH SCHEMA can be considered as a simpler shortcut for altering the schema configs but change nothing.
When executing CREATE SINK INTO TABLE, the SQL statement must be fully processed again to generate a plan. https://github.com/risingwavelabs/risingwave/blob/9d6594e6a42f64f8f1003a0783ecb0c586c3427c/src/frontend/src/handler/create_sink.rs#L449 In this case, it will also fetch the external defineiton again which could be invaild.
I elaborated what problems may happen when initial schema not persisted here:
https://github.com/risingwavelabs/risingwave/pull/18419/files#r1753603090
(And this PR fixed a bug caused by the exact problem