stumpy
stumpy copied to clipboard
[WIP] Add Tutorial and Derivations Notebooks for VALMOD #585
This notebook addresses issue #585. In this notebook, we would like to implement the VALMOD method proposed in VALMOD_2018 and VALMOD_2020.
What I have done so far:
- Provide introduction that provides gist of concept proposed in the paper
- Calculate Lower-Bound distance profile after correcting the typo in eq(1) of paper....and verify the calculation with a
np.random.uniformtime series data.
For now, I calculated LB given q>0 (see eq(2) in paper.) However, we still need to find LB when q <= 0.
Check out this pull request on ![]()
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB
@seanlaw
Please allow me some time to see if I can calculate LB for q<=0 (see eq(2) of paper). I will let you know when I am done...
I had a miscalculation. Although there is a typo in the paper, it seems the eq(2) of paper is correct. I fixed the typo of paper when I was doing calculation. However, I had a miscalculation somewhere else...so I corrected it and I got the eq(2)... for q>0. I will fix the notebook. (so, I assume the equation should be correct for case q<=0 as well).
Codecov Report
Patch and project coverage have no change.
Comparison is base (
275b998) 99.24% compared to head (f6126ca) 99.24%.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
Additional details and impacted files
@@ Coverage Diff @@
## main #586 +/- ##
=======================================
Coverage 99.24% 99.24%
=======================================
Files 82 82
Lines 12956 12956
=======================================
Hits 12858 12858
Misses 98 98
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
@seanlaw
Notebook is ready. The notebook covers the first 12 pages of VALMOD_2020 paper. I FIXed my miscalculation and things are good! I also implemented the Low-Bound distance profile function for now to see how it performs (we may use it later in VALMOD algorithm).
Just wanted to let you know that you can ignore the function '_calc_LB_dist_profile' at the end of notebook (it is working..but I think it is not clean. I may probably remove it as VALMOD algorithm does not use such function. I just created it to get Lower-Bound of distance profile for now to show the result)
I will first need to go over the initial 12 pages myself and then I will review the notebook :)
@NimaSarajpoor I've gone over your notebook quickly but haven't verified the derivation. Usually, with derivations, I like to write things out fully without skipping any steps (see https://github.com/TDAmeritrade/stumpy/blob/main/docs/Matrix_Profile_Derivation.ipynb). Some of your equations don't seem to be rendering for me and it's a bit hard for me to follow. I can try to find some time to work through the derivation to verify your work if that's helpful?
I see. Please let me re-write it. I will try to follow the same approach/style you used in the link you provided. I will check ReviewNB and if it is rendered well, I will let you know. Sounds good? (Btw, is it necessary to provide the derivation? I did that because of the typo in the paper, but then I realized the result is the same as what provided in eq(2) for q>0)
Yes, that would be great!
Personally, I think writing out the derivation clearly will help (me) and others reduce any doubt in understanding. Also, I find that it provides an opportunity to help maintain consistency in the code regarding variable names.
Weird...still not rendering well.... please let me do some investigation on my end to see what's going on...
@seanlaw So, I tried the Welford_Review/ Matrix_Profile_Derivation/ Pearson notebooks to see if ReviewerNB on Github can render them. Unfortunately, it seems that it cannot render them properly for the first two notebooks. Peason notebook is rendered properly though!
I guess you wrote the notebooks on your end and pushed them to the repo...and things were good when rendered locally in .ipynb. Did you, by any chance, try to check your notebooks via ReviewerNB of Github?
It seems the problem is related to the ReviewerNB of Github. I enclosed the math equations with $$ and it seems the problem is resolved. Well, almost resolved! I checked out the ReviewNB here and it seems there is still one error. I pushed the same notebook to a test_repo I created and it seems that single error does not appear when I checked it out with ReviewerNB.
@seanlaw Just for the records: if I click on the ReviewNB purple button (on top of this PR page), it seems there is still one error related to rendering. However, when I clicked on ReviewNB blue hyperlink (on top of this PR page just below the purple button), and navigated to the notebook from my fork of STUMPY, everything seems to be fine and there is no error in rendering....
Sounds good
Apologies, these comments are for an older commit. I forgot to hit "Finish Review" along with my last comment.
@NimaSarajpoor I provided some comments and stopped at the "Expanding (3)" line
@seanlaw Thanks for the comments!
Apologies, these comments are for an older commit. I forgot to hit "Finish Review" along with my last comment.
I found two of those comments in the ReviewNB. Maybe they got mixed together (?). I will address those two comments and then I ignore/resolve the rest. Please let me know if I miss anything.
I think we are all set. I can push commits after revising the notebook.
The notebook is ready. I addressed the comments and changed the rest of notebook accordingly as well.
@seanlaw I was wondering if you could confirm my understanding of Welford's notebook Welford_Review:
In line (23), the S(...) is actually CSS as its title said "The Corrected Sum of Squares". Correct? And, it is different than the standard deviation S in line (123).
Also...do you know the relationship between \sigma of length m+1, and \sigma of length m, both subsequences starting from the same index? I am planning to derive it but I want to know if such relationship is already discovered (so that I do not spend time on it)
@seanlaw I was wondering if you could confirm my understanding of Welford's notebook Welford_Review:
In line (23), the
S(...)is actually CSS as its title said "The Corrected Sum of Squares". Correct? And, it is different than the standard deviationSin line (123).
Yes, this seems to be correct. Welford referred to his "Corrected Sum of Squares" as S and so the initial set of derivations was for Welford's paper. However, once this was completed, I tried to switch over to STUMPY notation in the "Deriving Rolling Window Variance (or Standard Deviation) with Welford's Method" so that everything was "easier" to follow. Thus, the first half was confirming Welford's equations and this "running mean" and "running variance" is used in the so-called MPX algorithm and is what you will find in the Matrix Profile Derivations notebook.
The second half is for "rolling variance" and should be consumed separately from the "running variance" above. In STUMPY, when you take any time series and try to compute the rolling stddev, we use to to compute this using numpy directly. However, for sufficiently long time series, this basically consumed a ton of memory and resulted in things blowing up. So, instead, I switched over to ALSO (coincidentally) using Welford's method to compute a "rolling variance".
Also...do you know the relationship between
\sigmaof lengthm+1, and\sigmaof lengthm, both subsequences starting from the same index? I am planning to derive it but I want to know if such relationship is already discovered (so that I do not spend time on it)
Are you referring to Welford still here? If so, I believe that it is still referred to as Welford's Online Algorithm or computing a running variance. Note that there is no explicit "running stddev" derivation and, instead, you can compute "running stddev" by taking the square root of the "running variance"! This is basically what the Welford paper is all about (i.e., computing running variance or running mean)
It's been a while since I've touched it but I realized that the basic concept for computing a "rolling variance" (i.e., the size of the window always stays the same) versus computing a "running variance" (i.e., the size of the window always increments by one) is pretty similar (although the "rolling" version is more complicated than "running"). In STUMPY, we implemented the "rolling variance" (see welford_nanvar in core.py) and we also used "running variance" for stumpy.stump. Let me know if this answers your question
Please note that these rolling/running variance calculations are VERY tricky and is very susceptible to "catastrophic cancellation" and so one needs to take care to check the numerical stability!!!
Thanks for the info! That helps a lot. What I am looking for is "running variance". So, I will try to be extra careful in my calculation and provide the steps in the notebook so you can check them out as well. (I think finding that relationship should help me with finding LB for non-positive Pearson Correlation \rho.)
Thanks for the info! That helps a lot. What I am looking for is "running variance". So, I will try to be extra careful in my calculation and provide the steps in the notebook so you can check them out as well. (I think finding that relationship should help me with finding LB for non-positive Pearson Correlation \rho.)
@NimaSarajpoor Please re-read my response one more time as I was editing it as your were reading. Especially the first paragraph! Apologies for any confusion as I needed to wrap my head around it again :)
Please re-read my response one more time running variance" is used in the so-called MPX algorithm and is what you will find in the Matrix Profile Derivations notebook
Thanks for mentioning that! In fact, I checked the MP-Derivation notebook before, but I think it has running covariance and not the variance (maybe I am missing something).
Update:
Got it now! Cov(T,T) = \var(T)
So, I have running variance! Cool! Thanks!
So, I have running variance! Cool! Thanks!
Exactly! The running covariance is the superset of running variance. I'm going to add a note to the Welford Review notebook to make it clear that one is "running" and the latter is "rolling" and explain where each one is used in STUMPY
I revised the notebook according to latest comments. I could not find an easier way for plugging in the values of critical point in f(...). I added a few more steps and some explanation. Please let me know what you think and if it is easier to follow now.
@NimaSarajpoor Given the simplicity of the lower bound equations, I think that we should separate/split the notebook derivation for the lower bound equations from the VALMOD notebook (i.e., where we apply VALMOD). So, in the VALMOD notebook, we assume that the lower bound equations in the paper is correct and this is backed up by our LB derivation notebook. Does that make sense? At the end of the day, the VALMOD user should not need to care about the LB derivation.
Given the simplicity of the lower bound equations, I think that we should separate/split the notebook derivation for the lower bound equations from the VALMOD notebook (i.e., where we apply VALMOD).
I totally agree with this idea as every thing will be in its own place and it can result in two cleaner notebooks. So, should I change the title of this notebook / or submit a new PR? Or, do you want to take care of it yourself?
So, should I change the title of this notebook / or submit a new PR? Or, do you want to take care of it yourself?
I've updated the title of this PR and so please feel free to change the title of this notebook and add another notebook for the application of VALMOD
I think we have discussed all notes. I will push commits (I will push two notebooks).