stumpy icon indicating copy to clipboard operation
stumpy copied to clipboard

Add STIMP (Pan Matrix Profile) Tutorial

Open seanlaw opened this issue 4 years ago • 10 comments
trafficstars

An initial tutorial has already been created here

We need to add a final example where there are two different window sizes within the same dataset. The data can be found here.

seanlaw avatar Jun 16 '21 11:06 seanlaw

@mexxexx Please feel free to provide any feedback here

seanlaw avatar Jun 16 '21 11:06 seanlaw

Which example in the paper? Introduction fig1 or case study fig13?

ken-maeda avatar Jul 25 '22 08:07 ken-maeda

Which example in the paper? Introduction fig1 or case study fig13?

@ken-maeda The first example in the tutorial reproduces Figure 1 in the paper.

seanlaw avatar Jul 25 '22 10:07 seanlaw

Thanks, does tutorial snippet have to be same as paper example snippet in such a huge original dataset? Dataset is 21files of csv and each csv file has following columns.

Index(['Time', 'Unix', 'Aggregate', 'Appliance1', 'Appliance2', 'Appliance3',
       'Appliance4', 'Appliance5', 'Appliance6', 'Appliance7', 'Appliance8',
       'Appliance9'],
      dtype='object')

Appliance1 ~ Appliance9 can be candidate.

First csv file shape is (6960008, 12) The size of paper snippet is 250000. so one csv file include 27.84 times as much as paper's.

It seems no timestamps is noted. Should it be just finding smilar "data shape area"?

ken-maeda avatar Jul 25 '22 11:07 ken-maeda

img1 ・Reproduction of paper It depends on location of snippet, max(min) window size of stimp.

・Calculation time not sure about acceptable level of calculation time ex)It took about 7mins. Parameter: window size(100-2000) percentage(0.01) dataset length(250K) CPU: i9-12900KF This also should depend parameters and dataset length.

I think following factors have to be roughly decided.

  1. snippet similarity
  2. snippet size
  3. window size (how much difference of two windows is required?)
  4. (percentage)

ken-maeda avatar Aug 01 '22 02:08 ken-maeda

@ken-maeda This issue is for tracking the completion of the pan matrix profile tutorial (it is still incomplete). For questions on how to use stimp, can you please post your questions to our Github Discussions?

seanlaw avatar Aug 01 '22 03:08 seanlaw

I intended to ask about the dataset of this tutorial, the data you post is too big. I also tried to complete this tutorial.

ken-maeda avatar Aug 01 '22 04:08 ken-maeda

Which example in the paper? Introduction fig1 or case study fig13?

@ken-maeda I think there is a misunderstanding. When I said:

The first example in the tutorial reproduces Figure 1 in the paper.

I was referring to the fact that the first example in this tutorial reproduces Figure 1 in this paper.

In case you are trying to contribute a PR for this tutorial, I think most of the coding work is already completed. The only thing that remains is to add a proper narrative to the tutorial.

seanlaw avatar Aug 01 '22 04:08 seanlaw

Oh I see. I completely misunderstood. I thought new case of electrical load data is required. Sorry for causing a trouble.

ken-maeda avatar Aug 01 '22 04:08 ken-maeda

Oh I see. I completely misunderstood. I thought new case of electrical load data is required. Sorry for causing a trouble.

No problem @ken-maeda! You may be interested in contributing to #85 instead

seanlaw avatar Aug 01 '22 04:08 seanlaw