Allow to add derived columns that depends on new columns
Right now Pinot fails when a table is modified in such a way that two columns are added and one of them depend on the other.
More specifically, assuming table T has columns A1, A2 and A3, it should be possible to add columns B and C such as:
//table config
"ingestionConfig": {
"transformConfigs": [
{
"columnName": "C",
"transformFunction": "B + 1"
}
]
}
But right now this fails with:
Caused by: java.lang.RuntimeException: Failed to create derived column: C because argument: B does not exist in the segment
This error is thrown in https://github.com/apache/pinot/blob/85b5779cff8d33412937ceb6d4dfc972b6c8bd97/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/loader/defaultcolumn/BaseDefaultColumnHandler.java#L374 when at least one of the columns used in the transform function is not included in the columns already stored in the segment.
This is not a blocker because the same change can be done by first adding column B, reload and then modify the table again to add column D and reload.
It is important to note that at the same time we add this feature we should add checks to avoid cycle dependencies. Specifically, it should be forbidden to add columns E1, E2, ..., En when E1 depends on En and Ek depends on E(k-1) for all k in [1, n].
It is probably not enough to remove the check in BaseDefaultColumnHandler, we also need to verify that the code that actually computes the value of the derived column actually works.
Chatting with Siddharth Dharm offline, who'd been trying this issue out. As per him this doesn't reproduce with these steps. I tried just now, and it indeed doesn’t reproduce that way. I see both B and C get created.
BUt what I did notice in the process, was that C did not use the function to calculate it's value after B. So both B and C had value Integrer.MIN_INT. This is more apparent when you set a default value to B.
For instance:
I added 2 columns E and F, where F = E + 1. And I put defaultNullValue of 10 for E. But when I query after reload, I see E has value 10 and F has Integer.MIN_INT instead of 11.
Chatting with Siddharth Dharm offline, who'd been trying this issue out. As per him this doesn't reproduce with these steps. I tried just now, and it indeed doesn’t reproduce that way
@npawar that's because of the changes introduced in this PR - https://github.com/apache/pinot/pull/12648.