datafusion improve monotonicity api

Which issue does this PR close?

Closes #9879 .

Rationale for this change

The Vec<Option<bool>> type used to express the monotonicity of scalar functions wrt their arguments was really difficult to understand. Note: Currently only mathematical and date-time functions are considered for monotonicity.

What changes are included in this PR?

FuncMonotonicity has been changed from the type Vec<Option<bool>> to an enum.

pub enum FuncMonotonicity {
    None,
    Increasing,
    Decreasing,
    Mixed(Vec<Option<bool>>),
}

Are these changes tested?

By existing tests.

Are there any user-facing changes?

No.

Apr 17 '24 14:04 tinfoil-knight

The problem I see with this representation is that Monotonicity::Increasing and Monotonicity::Mixed(vec![Some(true), ..., Some(true)]) refers to the same actual configuration. If one writes a check like given_monotonicity == Monotonicity::Increasing (and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).

I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness. Maybe a struct with an inner Vec<Option<bool>> and methods that define an API structuring how one interacts with it.

@alamb

Apr 17 '24 16:04 ozankabak

@ozankabak Are you suggesting something like this?

use std::ops::Deref;

struct LimitedVector(Vec<Option<bool>>);

impl LimitedVector {
    fn new(initial: Vec<Option<bool>>) -> Result<Self, &'static str> {
        if initial.len() >= 2 {
            Ok(LimitedVector(initial))
        } else {
            Err("Initial vector length must be 2 or more")
        }
    }
}

impl Deref for LimitedVector {
    type Target = Vec<Option<bool>>;

    fn deref(&self) -> &Self::Target {
        &self.0
    }
}

// Example usage:
fn main() {
    let initial_vector = vec![Some(true), None];
    match LimitedVector::new(initial_vector) {
        Ok(limited_vector) => {
            println!("Length: {}", limited_vector.len()); // Length: 2
            println!("Elements: {:?}", *limited_vector); // Elements: [Some(true), None]
        }
        Err(err) => println!("Error: {}", err),
    }
}

Apr 17 '24 19:04 tinfoil-knight

The problem I see with this representation is that Monotonicity::Increasing and Monotonicity::Mixed(vec![Some(true), ..., Some(true)]) refers to the same actual configuration. If one writes a check like given_monotonicity == Monotonicity::Increasing (and I'm sure people will), one will miss the alternative representation (which may easily arise when one generates this information programmatically if no special treatment is done).

I think this is an excellent point

I think we can find a refactor that increases readability but doesn't introduce such traps and preserves uniqueness.

Another potential solution that would avoid allocating Vec so much would be to make the enum private and only allow construction through APIs

Maybe something like


// Non pub to ensure can only construct valid representations
pub enum FuncMonotonicityInner {
    None,
    Increasing,
    Decreasing,
    Mixed(Vec<Option<bool>>),
}

/// Describes function monotonicity
pub struct FuncMonotonicity(FuncMonotonicityInner);

impl FuncMontonicity {
  pub fn new_none() -> Self {..}
  pub fn new_increasing() -> Self { .. }
  pub fn new_decreasing() -> Self { .. }
  pub fn new_mixed(inner: Vec<bool>) -> { .. }

  /// returns true if this function is monotonically increasing with respect to argument number arg
  pub fn arg_increasing(&self, arg: usize) -> bool { .. }
  /// returns true if this function is monotonically decreasing with respect to argument number arg
  pub fn arg_decreasing(&self, arg: usize) -> bool { .. }
}

In my opinion the real value of a struct over a typedef is that it reduces the cognative load of remembering "what does monotonicity[0] == Some(false)" mean -- instead it can be monotonicty.arg_decreasing(0) which also has documentation explaining it in case you forget.

Apr 19 '24 11:04 alamb

I like your new proposal. In this one, new_mixed would check if inner consists entirely of true/false values, and return FuncMonotonicityInner::Increasing/FuncMonotonicityInner::Decreasing in those cases.

Instead of FuncMonotonicityInner, I recommend FuncMonotonicityPartial to be more in line with math verbiage.

Apr 19 '24 12:04 ozankabak

@tinfoil-knight -- are you interested in completing this PR with the proposed API updates?

Apr 21 '24 11:04 alamb

@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.

Apr 21 '24 11:04 tinfoil-knight

@alamb Yeah. I just have some college exams this week so might get delayed a bit but I'll pick this up as soon as possible.

No worries -- good luck with the exams!

Apr 21 '24 11:04 alamb

Converting to draft as it is not waiting on feedback

Apr 21 '24 11:04 alamb

Thanks @tinfoil-knight -- I will check this out later today

May 10 '24 14:05 alamb

I believe this PR has been superceded by https://github.com/apache/datafusion/pull/10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR

Thanks for the work on this @tinfoil-knight , and sorry we went another way.

BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.

May 21 '24 10:05 alamb

I believe this PR has been superceded by #10504 which removed the montonicity apu in favor of a more expressive bounds analysis, so closing this pR

Thanks for the work on this @tinfoil-knight , and sorry we went another way.

BTW @berkaysynnada in the future if you are working on a PR that might make another become obselete it might be nice to drop a note on the other PR to given the author(s) a heads up.

Sorry for the inconvenience, @tinfoil-knight. My main motivation for #10504 was not specifically related to the monotonicity API, but I had to make changes in those areas to achieve a better overall design. I apologize for not informing you earlier.

May 21 '24 10:05 berkaysynnada

No worries @berkaysynnada and @alamb. It's difficult to keep track of everyone's work in such a large project.

It was fun to work on this and I learnt a few things from everyone's review.

May 21 '24 10:05 tinfoil-knight

Thanks everyone for the communication. I just want everyone working on this project to have a good experience -- I am glad to hear that it was. Thanks again @tinfoil-knight and @berkaysynnada

May 22 '24 21:05 alamb

datafusion datafusion copied to clipboard

improve monotonicity api

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

datafusion
datafusion copied to clipboard