datafusion
datafusion copied to clipboard
improve monotonicity api
Which issue does this PR close?
Closes #9879 .
Rationale for this change
The Vec<Option<bool>>
type used to express the monotonicity of scalar functions wrt their arguments was really difficult to understand.
Note: Currently only mathematical and date-time functions are considered for monotonicity.
What changes are included in this PR?
FuncMonotonicity
has been changed from the type Vec<Option<bool>>
to an enum.
pub enum FuncMonotonicity {
None,
Increasing,
Decreasing,
Mixed(Vec<Option<bool>>),
}
Are these changes tested?
By existing tests.
Are there any user-facing changes?
No.
The problem I see with this representation is that Monotonicity::Increasing
and Monotonicity::Mixed(vec![Some(true), ..., Some(true)])
refers to the same actual configuration. If one writes a check like given_monotonicity == Monotonicity::Increasing
(and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).
I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness. Maybe a struct with an inner Vec<Option<bool>>
and methods that define an API structuring how one interacts with it.
@alamb
@ozankabak Are you suggesting something like this?
use std::ops::Deref;
struct LimitedVector(Vec<Option<bool>>);
impl LimitedVector {
fn new(initial: Vec<Option<bool>>) -> Result<Self, &'static str> {
if initial.len() >= 2 {
Ok(LimitedVector(initial))
} else {
Err("Initial vector length must be 2 or more")
}
}
}
impl Deref for LimitedVector {
type Target = Vec<Option<bool>>;
fn deref(&self) -> &Self::Target {
&self.0
}
}
// Example usage:
fn main() {
let initial_vector = vec![Some(true), None];
match LimitedVector::new(initial_vector) {
Ok(limited_vector) => {
println!("Length: {}", limited_vector.len()); // Length: 2
println!("Elements: {:?}", *limited_vector); // Elements: [Some(true), None]
}
Err(err) => println!("Error: {}", err),
}
}
The problem I see with this representation is that
Monotonicity::Increasing
andMonotonicity::Mixed(vec![Some(true), ..., Some(true)])
refers to the same actual configuration. If one writes a check likegiven_monotonicity == Monotonicity::Increasing
(and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).
I think this is an excellent point
I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness.
Another potential solution that would avoid allocating Vec
Maybe something like
// Non pub to ensure can only construct valid representations
pub enum FuncMonotonicityInner {
None,
Increasing,
Decreasing,
Mixed(Vec<Option<bool>>),
}
/// Describes function monotonicity
pub struct FuncMonotonicity(FuncMonotonicityInner);
impl FuncMontonicity {
pub fn new_none() -> Self {..}
pub fn new_increasing() -> Self { .. }
pub fn new_decreasing() -> Self { .. }
pub fn new_mixed(inner: Vec<bool>) -> { .. }
/// returns true if this function is monotonically increasing with respect to argument number arg
pub fn arg_increasing(&self, arg: usize) -> bool { .. }
/// returns true if this function is monotonically decreasing with respect to argument number arg
pub fn arg_decreasing(&self, arg: usize) -> bool { .. }
}
In my opinion the real value of a struct over a typedef is that it reduces the cognative load of remembering "what does monotonicity[0] == Some(false)
" mean -- instead it can be monotonicty.arg_decreasing(0)
which also has documentation explaining it in case you forget.
I like your new proposal. In this one, new_mixed
would check if inner consists entirely of true
/false
values, and return FuncMonotonicityInner::Increasing
/FuncMonotonicityInner::Decreasing
in those cases.
Instead of FuncMonotonicityInner
, I recommend FuncMonotonicityPartial
to be more in line with math verbiage.
@tinfoil-knight -- are you interested in completing this PR with the proposed API updates?
@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.
@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.
No worries -- good luck with the exams!
Converting to draft as it is not waiting on feedback
Thanks @tinfoil-knight -- I will check this out later today
I believe this PR has been superceded by https://github.com/apache/datafusion/pull/10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR
Thanks for the work on this @tinfoil-knight , and sorry we went another way.
BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.
I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR
Thanks for the work on this @tinfoil-knight , and sorry we went another way.
BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.
Sorry for the inconvenience, @tinfoil-knight. My main motivation for #10504 was not specifically related to the monotonicity API, but I had to make changes in those areas to achieve a better overall design. I apologize for not informing you earlier.
No worries @berkaysynnada and @alamb. It's difficult to keep track of everyone's work in such a large project.
It was fun to work on this and I learnt a few things from everyone's review.
Thanks everyone for the communication. I just want everyone working on this project to have a good experience -- I am glad to hear that it was. Thanks again @tinfoil-knight and @berkaysynnada