risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

discussion(connector): persist avro/proto definition in meta

Open xiangjinwu opened this issue 2 years ago • 7 comments

Do we still choose to fetch the descriptor from registry every time we create a new formatter instead of persisting it in meta when creating the sink?

Originally posted by @wenym1 in https://github.com/risingwavelabs/risingwave/pull/12858#discussion_r1366954735

xiangjinwu avatar Oct 20 '23 14:10 xiangjinwu

Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?

hzxa21 avatar Oct 23 '23 03:10 hzxa21

Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?

It's also for consistency and stability. If we don't persisting the metadata in meta, if the value is changed in the external system, RW may be implicitly affected.

wenym1 avatar Oct 24 '23 10:10 wenym1

Just to confirm, the benefit of persisting avro/proto definition in meta is not needing to fetch it from schema registry on restart/recovery?

It's also for consistency and stability. If we don't persisting the metadata in meta, if the value is changed in the external system, RW may be implicitly affected.

Yes, but maybe we just need to persist the version number instead of the whole descriptor set.

tabVersion avatar Nov 10 '23 06:11 tabVersion

Yes, but maybe we just need to persist the version number instead of the whole descriptor set.

I was assuming this to be the pre-req of #11800/#14056/#14057 but it actually went to another direction that allows to alter to a completely new URL/subject rather than bumping the version number.

Also some related questions to answer:

  • What if not using schema registry?
    • There is no version number and we would have to persist the definition itself.
    • Leave it unhandled because non-registry already has other limitations as well.
  • What about other sinks?
    • For example, doris sink also queries downstream for schema before writing to it. Are we also worried it would change implicitly and would like to handle them similarly?

xiangjinwu avatar Jan 09 '24 06:01 xiangjinwu

Strong +1 for this. It would be very counter-intuitive if RisingWave restarts and fails because some schema definition URL is expired/invalid.

fuyufjh avatar Mar 01 '24 04:03 fuyufjh

By the way, correspondingly, we shall provide a command ALTER TABLE/SOURCE ... REFRESH SCHEMA so that users can update the schema easily.

Related #15025. After that, REFRESH SCHEMA can be considered as a simpler shortcut for altering the schema configs but change nothing.

fuyufjh avatar Mar 01 '24 04:03 fuyufjh

When executing CREATE SINK INTO TABLE, the SQL statement must be fully processed again to generate a plan. https://github.com/risingwavelabs/risingwave/blob/9d6594e6a42f64f8f1003a0783ecb0c586c3427c/src/frontend/src/handler/create_sink.rs#L449 In this case, it will also fetch the external defineiton again which could be invaild.

st1page avatar Jun 26 '24 11:06 st1page

I elaborated what problems may happen when initial schema not persisted here:

https://github.com/risingwavelabs/risingwave/pull/18419/files#r1753603090

(And this PR fixed a bug caused by the exact problem

xxchan avatar Sep 11 '24 08:09 xxchan