What's the root cause to wiki's recommendation: "Pause consumption when add a new column"?
In the ingestion transformation wiki page, it mentioned that
If a new column is added to table or schema configuration during ingestion, incorrect data may appear in the consuming segment.
To ensure accurate values are reloaded, do the following: Pause consumption (and wait for pause status success)
Is it only limited to table which has ingestion transform config? If not, it seems a breaking change since we need to pause consumption for every schema update.
Another question: In which scenario the data will incorrectly appears?
I think the doc might be outdated. When the table config or schema is updated (e.g. new column/index added), a reload is required for segments to pick up the new config and generate the new column/index as needed. For consuming segment, reload (with includingConsuming flag set) will try force committing the segment and start a new consuming segment so that the config change can be picked up.
Do you want to help try this out and revise the documentation?
cc @kelseiv
Thanks Jackie! I will revise the doc when I have bandwidth to do the test.
@Jackie-Jiang what's the best practice for the upsert tables. Is it required to restart servers for both full upsert and partial upsert tables?
for partial upsert table i think it's required since we need to re-initialize the upsert data manager so newly added column can be appeared in the upsertHandler.
For upsert table, currently we have to restart the server in order to pick up the table config changes because it is associated with table data manager instead of segment data manager.