stumpy icon indicating copy to clipboard operation
stumpy copied to clipboard

Ensure 2D Array Matrix Profile Outputs

Open seanlaw opened this issue 3 years ago • 7 comments

As we move toward supporting top-k matrix profiles, we need to ensure consistency of our outputs and they need to be 2D instead of 1D.

This is related to #592 and #639

seanlaw avatar Jul 07 '22 12:07 seanlaw

@NimaSarajpoor If you comment on this then I can assign it to you

seanlaw avatar Jul 07 '22 13:07 seanlaw

@seanlaw

@NimaSarajpoor If you comment on this then I can assign it to you

Sure. I am going to work on this after we develop top-k matrix profile feature for both normalized and non-normalized methods.

NimaSarajpoor avatar Jul 07 '22 15:07 NimaSarajpoor

@NimaSarajpoor A part of me is starting to doubt this choice of forcing everything to be 2D. I'm just trying to think through the majority of use cases and that we should target that as being the most "common" scenario. I'm guessing that 99% of the time users will only care about k=1 (and therefore they only care about 1D output). What do you think?

And then in the (less than) 1% case, some users will choose k > 1 but they would/should expect the output to be 2D in those cases.

seanlaw avatar Jul 18 '22 14:07 seanlaw

I'm guessing that 99% of the time users will only care about k=1 (and therefore they only care about 1D output). What do you think?

I think 2D output for left/right might be a little bit too much.

But what about P and I? Personally speaking, I think we can go with 1D output for k=1 since, as you said, the majority of users care about k=1. Although the user needs to simply do .reshape(-1, ) for k=1 when output is 2D, I think 1D is still better because that is what a user probably expects to see in the output.

Your vision is definitely better than mine :) so, please ignore what I said if it does not make sense to you 😄

NimaSarajpoor avatar Jul 18 '22 16:07 NimaSarajpoor

So, behind the scenes (i.e., with private functions), I think it is fine to just keep everything as 2D. However, when we are able to return P and I separately to the user (e.g., stream.P_), then maybe we should make check to see if P.shape[1] == 1 and k == 1: and then return a 1D array. Otherwise, return 2D. Something like that?

Certainly, for stumpy.stump where P and I are squashed into a single 2D array then it doesn't matter and we are still good. It's really on the rare cases where (we use a class) it is tricky.

seanlaw avatar Jul 18 '22 18:07 seanlaw

then maybe we should make check to see if P.shape[1] == 1 and k == 1: and then return a 1D array. Otherwise, return 2D. Something like that?

Yeah...that would be a good idea... Most users care about public API and it would be better(?) to see 1D for k=1 as this is what a user usually expects in such case.

Certainly, for stumpy.stump where P and I are squashed into a single 2D array then it doesn't matter and we are still good. It's really on the rare cases where (we use a class) it is tricky.

Correct... that is the tricky part :)

NimaSarajpoor avatar Jul 18 '22 18:07 NimaSarajpoor

Let's continue thinking about it. This is a good exercise in planning out the design and how our decisions may ultimately affect others. My goal is to minimize the pain/problems for the majority of people.

seanlaw avatar Jul 18 '22 21:07 seanlaw

@NimaSarajpoor Is this technically completed? Can it be closed?

seanlaw avatar Dec 04 '22 19:12 seanlaw

@seanlaw I believe so. We have decided to go with always-2D for only private functions. So, I think it should be okay to close this :)

NimaSarajpoor avatar Dec 04 '22 20:12 NimaSarajpoor

Awesome! Thanks for the confirmation

seanlaw avatar Dec 04 '22 20:12 seanlaw