risingwave icon indicating copy to clipboard operation
risingwave copied to clipboard

bug: source should never drop event for missing/redundant fields

Open fuyufjh opened this issue 2 years ago • 1 comments

According to @tabVersion, If an input JSON misses some fields defined in source definition aka. the create source statement, or contains some fields not defined in definination, the entire JSON will be dropped.

This behavior is quite dangerous. I think we should absorb the incoming data in a best-effor way. For example, leave NULL for missing columns and ignore the additional columns.

fuyufjh avatar Aug 12 '22 05:08 fuyufjh

And DLQ for corrupted bytes that are not even json?

xiangjinwu avatar Aug 12 '22 06:08 xiangjinwu

And DLQ for corrupted bytes that are not even json?

json deserialization is quite deterministic. I think it is okay to drop corrupted ones.

tabVersion avatar Aug 14 '22 11:08 tabVersion

As shown in #4626, the behavior of the json parser is as expected. I believe I was confused by json and debezium json before. Json parser can insert NULL in missing fields and ignore abundant fields. Since we have no not NULL constrain now, I think nothing to change. we can close the issue.

tabVersion avatar Aug 14 '22 11:08 tabVersion