vaex icon indicating copy to clipboard operation
vaex copied to clipboard

🐛 Expression.minmax() not returning proper value for datetimes

Open kmcentush opened this issue 2 years ago • 8 comments

Fixes #1456

I left some comments in the #development Slack channel, as it looks like min(), max() and minmax() were using different approaches. I'm not sure which is more efficient/robust long-term, but I opted for the simpler solution.

Note: In this solution, StatOpMinMax is no longer used anywhere in the code base.

kmcentush avatar Jul 17 '21 20:07 kmcentush

Somehow missed all of those tests failing. Will check again on this PR later tonight.

kmcentush avatar Jul 22 '21 16:07 kmcentush

yeah, seems odd, you could try a rebase.. although I don't remember seeing these issues before.

maartenbreddels avatar Jul 22 '21 16:07 maartenbreddels

The latest commit should fix the issue. I was returning a list instead of the np array like before. Also, my previous implementation didn't respect delayed execution, so I changed the code to use the previous structure and just swapped out the self.limits() call for min() and max(), respectively.

This may suggest that the root case bug in limits() (or more likely, StatOpMinMax?) is still present for my added test case. But as stated above in the original PR comment, this now makes StatOpMinMax dead code.

kmcentush avatar Jul 23 '21 06:07 kmcentush

Not sure why it's failing on non-Windows builds... The errors in the Linux and Mac builds are notebook timeout errors.

kmcentush avatar Jul 24 '21 22:07 kmcentush

Ah don't worry about it. Those errors are unrelated. I hope we can clean them up soon. I don't think that the notebooks run on the windows CI which is why that part does not fail :).

Looks good to me! But let's wait for @maartenbreddels to also give his blessing :)

Thanks!

JovanVeljanoski avatar Jul 25 '21 20:07 JovanVeljanoski

Thanks for this PR! However, this reminds me that we actually need to have a minmax aggregator, now we lose a bit of performance by doing a min and a max separately. @JovanVeljanoski what do you think?

maartenbreddels avatar Aug 04 '21 11:08 maartenbreddels

Ups I completely forgot about this one :(

Well, indeed it would be great to do min and max in one pass over the data. If one is interested only in one of them, is there any performance benefit to doing only min for example? @maartenbreddels

JovanVeljanoski avatar Oct 19 '21 19:10 JovanVeljanoski

Yes, we have both min, max and minmax for that reason.

maartenbreddels avatar Oct 20 '21 10:10 maartenbreddels