feat: Support decimal for variance
Which issue does this PR close?
- Part of #1545.
Rationale for this change
What changes are included in this PR?
- widen the variance UDF signature to allow decimal arguments while keeping Float64 outputs, and add unit coverage that feeds decimal batches through both population and sample accumulators
- extend aggregate.slt with decimal variance queries and align the expected strings with the Float64 results emitted by the engine
Are these changes tested?
Are there any user-facing changes?
I don't think the changes are doing what you expect here; I believe decimals are still being converted to float for these calculations. What we are looking to do is to perform the calculations natively on decimals without casting them to floats. The function already supports decimal inputs by casting them to floats.
Thanks for pointing that. I just checked out the PR #1408 for the guidance and updated the code. I also used LLM for this.(disclosure)
One of the key requirements that we want here is that decimal inputs are not cast to floats, and the return type remains decimal. I think if you check the tests introduced here you'll see that is not the case currently, hence the code for native decimal implementation isn't actually being used.
Unfortunately I don't currently have the capacity to try guide this PR along; I feel some of my previous comments have still not been addressed and a lot of the code introduced here seems hard to follow along & understand.
Hopefully I'll be able to circle back to this in a few weeks.
Thanks for being patient with this one. This PR has been particularly tricky for me. I appreciate your feedbacks as always and try to incorporate. Let me know whenenver you get the time for moving forward with this, I will try to follow along with your suggestions.
I took another quick look, and I think some of my prior comments were inaccurate, I apologize for that. I was mainly thrown by some parts of the code which lead me down the wrong line of thinking; for example, this signature:
fn variance_signature() -> Signature {
Signature::one_of(
vec![
TypeSignature::Numeric(1),
TypeSignature::Coercible(vec![Coercion::new_exact(
TypeSignatureClass::Decimal,
)]),
],
Volatility::Immutable,
)
}
The coercible part there is redundant since Numeric(1) should already cover decimals; I was mistaken in thinking the new code paths introduced in this PR were not being in effect because of this misconception.
My high level thoughts on this PR now are:
- Would benefit from cleaning up/refactoring code to be more concise
- For example, the new
is_numeric_or_decimal()function seems extremely redundant considering a decimal type is already numeric; having code like this is quite confusing and makes it harder to understand what's actually important in this code that we should be reviewing more carefully
- For example, the new
- Seems odd there is a manual implementation of casting decimal to float here (is that what is happening?)
- Is it possible to maintain the return type as decimal instead of casting to float, or does that introduced precision loss?
I am taking another crack at it and try to refactor the code along with removing the redundant code. The code changes I'm doing still pretty complex to me at the moment and could be adding more complexity. Let's see how it goes. I will push the changes for you to take a look, if it still seems not upto the mark. I can make this as draft and tackle this later on when I have more confidence on the implementation.