calcite icon indicating copy to clipboard operation
calcite copied to clipboard

[CALCITE-7122] Eliminate nested calls for idempotent unary functions UPPER/LOWER/ABS/INITCAP/CEIL/FLOOR

Open xuzifu666 opened this issue 5 months ago • 24 comments

jira: https://issues.apache.org/jira/browse/CALCITE-7122

xuzifu666 avatar Aug 03 '25 02:08 xuzifu666

Can we take this discussion to Jira? We need to have a conversation about specification. Those conversations belong in Jira.

Also please STOP logging a jira case just the moment before you push a PR. We know that you have been working on the PR for a few days. Give the rest of us chance to weigh in on the specification before you write the implementation.

Please fix the reference to the jira case. Is it 7122 or 7112?

julianhyde avatar Aug 03 '25 18:08 julianhyde

Can we take this discussion to Jira? We need to have a conversation about specification. Those conversations belong in Jira.

Also please STOP logging a jira case just the moment before you push a PR. We know that you have been working on the PR for a few days. Give the rest of us chance to weigh in on the specification before you write the implementation.

Please fix the reference to the jira case. Is it 7122 or 7112?

OK, thanks for the reminder. I will list the JIRA in advance so that everyone can evaluate it together. @julianhyde

xuzifu666 avatar Aug 04 '25 01:08 xuzifu666

Can idempotent functions such as ceil and floor also be considered together?

xiedeyantu avatar Aug 09 '25 06:08 xiedeyantu

Can idempotent functions such as ceil and floor also be considered together?

The ceil and floor functions have already been separately processed before, and the logic is a little complex, so they should be no need to handle them here.

xuzifu666 avatar Aug 09 '25 07:08 xuzifu666

Can idempotent functions such as ceil and floor also be considered together?

The ceil and floor functions have already been separately processed before, and the logic is a little complex, so they should be no need to handle them here.

I see, these two functions are binary functions. I don't have any other questions for now.

xiedeyantu avatar Aug 09 '25 08:08 xiedeyantu

I see, these two functions are binary functions.

Numeric CEIL and FLOOR are unary. E.g. CEIL(CEIL(3.14)) = CEIL(3.14) = 4.

Datetime CEIL and FLOOR are binary. E.g. FLOOR(FLOOR(DATE '2025-02-08' TO MONTH)) = FLOOR(DATE '2025-02-08' TO MONTH) = DATE '2025-02-01'. The second parameter is a time unit, basically a literal over an enum type, so we could regard the functions as a family of unary functions, e.g. FLOOR_MONTH, CEIL_MONTH, FLOOR_YEAR, etc.

julianhyde avatar Aug 09 '25 19:08 julianhyde

I see, these two functions are binary functions.

Numeric CEIL and FLOOR are unary. E.g. CEIL(CEIL(3.14)) = CEIL(3.14) = 4.

Datetime CEIL and FLOOR are binary. E.g. FLOOR(FLOOR(DATE '2025-02-08' TO MONTH)) = FLOOR(DATE '2025-02-08' TO MONTH) = DATE '2025-02-01'. The second parameter is a time unit, basically a literal over an enum type, so we could regard the functions as a family of unary functions, e.g. FLOOR_MONTH, CEIL_MONTH, FLOOR_YEAR, etc.

I agree with @julianhyde 's idea. I also have a suggestion—though I’m not sure how feasible it is—to consolidate all the currently identified unary idempotent functions into a unified encapsulation while ensuring compatibility with the existing ones in  simplify . The goal would be to extract a shared logic layer to avoid scattering similar implementations across multiple places, which could deduce future maintenance costs. Currently, only four functions are handled uniformly, and without careful review, others might mistakenly treat  CEIL  and  FLOOR  as non-idempotent or assume they were overlooked. If modifying this proves too difficult, we could at least document all the relevant functions in the comments to prevent future developers from thinking the feature was implemented incompletely. I’m not entirely sure if this suggestion is valid. If Julian has time, I’d appreciate it if you could take a look as well.

xiedeyantu avatar Aug 10 '25 00:08 xiedeyantu

Floor and ceil are not purely unary functions also support binary, and they were previously handled separately, so I did not include them in the logic of the unary function list. They were not accidentally omitted, and I think it is not a problem to handle them uniformly in unary way.

But regarding the scalability mentioned,personally I believe that putting all idempotent functions on a unified list and identifying other types is consistent with the current approach. If there is no clear transformation plan, there is no need to abstract it, which may increase the understanding cost for subsequent developers or let someone with a deep understanding of the plan to come up with a clear plan to follow up separately.

xuzifu666 avatar Aug 10 '25 00:08 xuzifu666

what is the status of this PR? It is marked as "changes requested". Have all change request been addressed?

mihaibudiu avatar Sep 24 '25 17:09 mihaibudiu

what is the status of this PR? It is marked as "changes requested". Have all change request been addressed?

IMO it had been addressed all comments now. @mihaibudiu

xuzifu666 avatar Sep 25 '25 09:09 xuzifu666

@xiedeyantu can you please confirm that you are satisfied with these changes?

mihaibudiu avatar Sep 25 '25 16:09 mihaibudiu

@xiedeyantu can you please confirm that you are satisfied with these changes?

@mihaibudiu I'm just concerned that the current implementation might be too loose and not conducive to long-term maintenance. What I was trying to express is whether each function itself could have a common property to indicate whether it is an idempotent function. If there were a way to achieve this point, it would also benefit functional dependency analysis later, such as determining whether a function is an injective function. Or perhaps I'm being too critical - logically I don't think there should be any errors in the current approach. If you think the current implementation is fine, I have no objection.

xiedeyantu avatar Sep 25 '25 22:09 xiedeyantu

So your suggestion is to add a property to SqlFunction which says whether the function is idempotent? Frankly, I think there would be too many such properties to create for various interesting semantic properties of functions that could be used by the optimizer. Moreover, I think that the kinds of programs optimized by this PR never occur in practice, because this is just not how people write programs. So I think this is a lot of work for very little benefit.

mihaibudiu avatar Sep 25 '25 23:09 mihaibudiu

So your suggestion is to add a property to SqlFunction which says whether the function is idempotent? Frankly, I think there would be too many such properties to create for various interesting semantic properties of functions that could be used by the optimizer. Moreover, I think that the kinds of programs optimized by this PR never occur in practice, because this is just not how people write programs. So I think this is a lot of work for very little benefit.

If I understand correctly, you also hope that all functions can have some necessary common properties, although not necessarily the idempotent property. If this PR can achieve minimal intrusive changes, I’m also fine with it. However, I also think that the optimization in the current PR is not very cost-effective.

@mihaibudiu I'm considering exploring whether it's feasible to add some useful attributes to functions. At a certain stage, achieving function dependency capabilities requires handling specific functions, but I'm not sure if adding attributes is a worthwhile effort for Calcite.

xiedeyantu avatar Sep 25 '25 23:09 xiedeyantu

There are some properties like determinism, behavior with nulls, etc. which are very useful for optimizations. Idempotence - perhaps less. I think this should be judged on a case by case basis.

mihaibudiu avatar Sep 25 '25 23:09 mihaibudiu

So let's review this once more and merge it if it works; we are spending a lot of time on an optimization which will probably be never applied in practice.

mihaibudiu avatar Sep 25 '25 23:09 mihaibudiu

What's the status of this PR? Does it fulfill the scope that was agreed upon?

rubenada avatar Oct 08 '25 17:10 rubenada

There is some concern that this is relatively intrusive, making code more complex and hard to maintain, and it does not really solve a practical problem. That's why it wasn't merged yet.

mihaibudiu avatar Oct 08 '25 17:10 mihaibudiu

I would be more confident if someone could supply a real example where this would be beneficial. The only case I can imagine is for machine-generated SQL.

mihaibudiu avatar Oct 08 '25 17:10 mihaibudiu

I would be more confident if someone could supply a real example where this would be beneficial. The only case I can imagine is for machine-generated SQL.

This should be an optimization for functions, similar to generating more efficient SQL. This idempotency elimination is a relatively obvious SQL optimization, but it is hard to say how much efficiency it can improve. For example, some enterprise SQL platforms will first optimize SQL, requiring the most concise and efficient SQL output, and then send it to the compute engine for execution. I think this optimization is profitable, although not much. @mihaibudiu

xuzifu666 avatar Oct 09 '25 02:10 xuzifu666

Can you show a (non-artificial) program written by some person or machine which would benefit from this optimization?

mihaibudiu avatar Oct 09 '25 17:10 mihaibudiu

I hadn't noticed this information before, but I see that it mentions other architectural design suggestions, so we might need to discuss it further.

xuzifu666 avatar Nov 05 '25 02:11 xuzifu666

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 90 days if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Dec 05 '25 03:12 github-actions[bot]