stumpy
stumpy copied to clipboard
Fix precision
This PR is a replacement for PR #657, which tackles issue #610.
Regarding PR #657: There was a discrepancy between the files in the local repo and the remote repo, but I couldn't figure that out. That is why I am submitting a new PR.
@seanlaw
This is strange! Now, when I run test, it says "83 files would be left unchanged". Now, this is the same as what we see here in Github Actions. And, I get no error when I run test in my local repo.
$ ./test.sh custom 1 tests/test_stump.py
Cleaning Up
Checking Black Code Formatting
All done! \u2728 \U0001f370 \u2728
83 files would be left unchanged.
Checking Flake8 Style Guide Enforcement
Executing Custom User-Defined Tests Only
Custom Test: 1 / 1
============================= test session starts =============================
platform win32 -- Python 3.9.13, pytest-7.1.2, pluggy-1.0.0
rootdir: C:\Users\AS17983\Desktop\NIMA\stump\stumpy, configfile: pytest.ini
plugins: anyio-3.6.1, cython-0.2.0
collected 27 items
tests\test_stump.py ........................... [100%]
============================= 27 passed in 19.44s =============================
@NimaSarajpoor Instead of running test.sh, can you simply run this in a Jupyter notebook:
m = 3
zone = int(np.ceil(m / 4))
seed = 1
np.random.seed(seed)
T = np.random.rand(64)
scale = np.random.choice(np.array([0.001, 0, 1000]), len(T), replace=True)
T[:] = T * scale
ref_mp = naive.stump(T, m, exclusion_zone=zone, row_wise=True)
comp_mp = stump(T, m, ignore_trivial=True)
for i, (ref, cmp) in enumerate(zip(ref_mp[:, 0], comp_mp[:, 0])):
if np.abs(ref - cmp) > 1e-7:
print(i, ref, cmp)
I get:
12 2.950287948697726e-05 0.0
13 1.0329297510446037e-05 0.0
19 6.434999199613844e-07 0.0
22 5.364470225196175e-07 0.0
33 8.523796584076095e-07 0.0
35 1.5414353887031432e-06 0.0
40 1.658550663133749e-07 0.0
60 2.486220486843771e-07 0.0
If I do:
for i, (ref, cmp) in enumerate(zip(ref_mp[:, 0], comp_mp[:, 0])):
print(i, ref, cmp)
Then I get the following:
0 1.5700924586837752e-16 0.0
1 1.5700924586837752e-16 0.0
2 1.658550663133749e-07 1.6323404237781945e-07
3 2.4257150105478956e-07 2.421152116249788e-07
4 6.454152836022342e-07 6.452392069879462e-07
5 0.22155833569278333 0.2215583356927852
6 0.09204964741053696 0.09204964741054301
7 1.9109645596859633e-07 1.8966081829902314e-07
8 1.5700924586837752e-16 0.0
9 2.220446049250313e-16 0.0
10 0.012944862817693177 0.012944867345093258
11 0.49025821651994234 0.4902582165199456
12 2.950287948697726e-05 0.0
13 1.0329297510446037e-05 0.0
14 3.3457789477422945e-05 3.3457703025089785e-05
15 0.26636729814247995 0.26636729659228936
16 3.656951842685141e-08 0.0
17 0.46944298605172013 0.4694429860517184
18 0.03103461052726588 0.031034609854201676
19 6.434999199613844e-07 0.0
20 5.535501007487065e-08 0.0
21 8.19261056164571e-10 0.0
22 5.364470225196175e-07 0.0
23 8.030629626643827e-07 8.021753331240875e-07
24 1.9109645596859633e-07 1.8966081829902314e-07
25 1.5700924586837752e-16 0.0
26 0.0 0.0
27 2.53526511563009e-07 2.0157917229039495e-07
28 8.553525901666268e-08 0.0
29 4.6884218468212213e-07 4.6814316726937914e-07
30 0.10353933637363609 0.10353933656145922
31 0.1051316158248435 0.10513161293237559
32 0.48220646477596524 0.48220646544989193
33 8.523796584076095e-07 0.0
34 1.7325318446717188e-06 1.7323200730701582e-06
35 1.5414353887031432e-06 0.0
36 0.04048989128516726 0.040489912232822924
37 3.656951842685141e-08 0.0
38 3.656951842685141e-08 0.0
39 0.26636729814247995 0.26636729659228936
40 1.658550663133749e-07 0.0
41 4.6884218468212213e-07 4.6814316726937914e-07
42 6.460441508916396e-06 6.458583395855192e-06
43 0.03103461052726588 0.031034609854201676
44 0.10353933637363609 0.10353933656145922
45 0.09204964741053696 0.09204964741054301
46 8.19261056164571e-10 0.0
47 8.523796584076095e-07 8.218635459624336e-07
48 0.1051316158248435 0.10513161293237559
49 0.07416506824467556 0.07416506824466002
50 0.012944862817693177 0.012944867345093258
51 0.48220646477596524 0.48220646544989193
52 0.42052087653681697 0.42052087653681747
53 5.535501007487065e-08 0.0
54 2.7194799110210365e-16 0.0
55 0.0 0.0
56 2.760744472197097e-06 2.7529265174893867e-06
57 0.07416506824467556 0.07416506824466002
58 2.3357370178318246e-07 2.264780425067345e-07
59 0.45226821973180753 0.4522682194826431
60 2.486220486843771e-07 0.0
61 1.5700924586837752e-16 0.0
I get:
12 2.950287948697726e-05 0.0 13 1.0329297510446037e-05 0.0 19 6.434999199613844e-07 0.0 22 5.364470225196175e-07 0.0 33 8.523796584076095e-07 0.0 35 1.5414353887031432e-06 0.0 40 1.658550663133749e-07 0.0 60 2.486220486843771e-07 0.0
I got a different result:
33 8.523796584076095e-07 0.0
35 1.5414353887031432e-06 0.0
40 1.658550663133749e-07 0.0
41 4.6884218468212213e-07 0.0
42 6.460441508916396e-06 0.0
56 2.760744472197097e-06 0.0
60 2.486220486843771e-07 0.0
Btw, I compared the my local branch and the remote branch in origin (for both main and fix_precision) to make sure everything is the same. I used the instructions provided in this stackoverflow post. Everything is okay.
I got a different result:
I have a few observations:
- For testing
stump, should we really be settingrow_wise=Trueinnaive.stump? Sincestumpis diagonal-wise, shouldn't we setrow_wise=False? This doesn't solve the problem but it is inconsistent 12 2.950287948697726e-05 0.0means thatnaive.stumpis returning a non-zero distance foridx = 12in Github Actions (and in my local branch of your repo). Can you verify what the pairwise distances are foridx = 12? Here are my pairwise distances generated fromnaive.stump:
0 3.0000184482883
1 3.689691694550767e-05
2 2.999981187305404
3 3.0000188123406852
4 3.807877925156031e-05
5 0.9716146011928827
6 3.365385804112601
7 3.00001930941092
8 3.0000184482883
9 3.689691694550766e-05
10 2.7130038423554645
11 1.293751201679764
12 0.0
13 2.9999872380440378
14 3.0000368085323212
15 1.9609012370025143
16 2.999981551371355
17 2.644560505532764
18 0.4142995718310496
19 3.5963320995608045e-05
20 2.9999819288490794
21 3.0000186910572095
22 3.743336396799641e-05
23 2.9999807857662883
24 3.0000192138645323
25 3.0000184482883
26 3.689691694550767e-05
27 3.660682091553413e-05
28 2.9999816124248677
29 3.0000181726311808
30 0.6555850486723159
31 2.8129002184921874
32 3.373545663525392
33 3.9431652231321966e-05
34 2.9999799194833465
35 3.0000200801134347
36 2.979568418597036
37 3.686034742708289e-05
38 2.999981569656451
39 1.735512606848563
40 2.999981270234494
41 3.000017938214321
42 2.950287948697726e-05
43 0.3834710886174519
44 0.7569603930026928
45 3.342379253523891
46 3.000018690647586
47 4.028403188967316e-05
48 2.8729632528630487
49 0.27224456911612394
50 2.7210340045887635
51 3.2311659747454886
52 0.7619746520574434
53 2.9999819565270847
54 3.0000184482882997
55 3.689691694550767e-05
56 2.9999785390824387
57 0.3461178433786073
58 2.9999820733160356
59 2.316949684170337
60 2.9999817367381696
61 3.0000184482883
I generated this by adding the following three lines to naive.stump:
def stump(T_A, m, T_B=None, exclusion_zone=None, row_wise=False):
if T_B is None: # self-join:
ignore_trivial = True
distance_matrix = np.array(
[distance_profile(Q, T_A, m) for Q in core.rolling_window(T_A, m)]
)
T_B = T_A.copy()
else:
ignore_trivial = False
distance_matrix = np.array(
[distance_profile(Q, T_B, m) for Q in core.rolling_window(T_A, m)]
)
distance_matrix[np.isnan(distance_matrix)] = np.inf
# Added these three lines
if distance_matrix.shape[0] >= 12:
for i in range(distance_matrix.shape[1]):
print(i, distance_matrix[12, i])
According to this, subsequence with idx = 42 corresponds to the one-nearest neighbor to idx = 12
- For testing
stump, should we really be settingrow_wise=Trueinnaive.stump? Sincestumpis diagonal-wise, shouldn't we setrow_wise=False? This doesn't solve the problem but it is inconsistent
I think we should. btw, I tried row_wise=False before and then I realized we can set it to row_wise=True IF what we care about is just the matrix profile (and not the matrix profile indices). I mean...in such case, row_wise=True should be okay. However, as you said, we should better set it to row_wise=False for the sake of consistency.
PLEASE IGNORE THE FOLLOWING COMMENT
2. Here are my pairwise distances generated from
naive.stump:
from naive.stump or stumpy.stump? Because, the value you provided for idx=12 is 0.0, which is not the same as what you provided earlier: 12 2.950287948697726e-05 0.0 (I think the last value here, i.e. 0.0, is from stumpy.stump)
In my local repo, I got: 12 2.950287948697726e-05 2.950306866348452e-05.
UPDATE:
Okay, I think you set row_wise=False now. Right?
from naive.stump or stumpy.stump? Because, the value you provided for idx=12 is 0.0, which is not the same as what you provided earlier: 12 2.950287948697726e-05 0.0 (I think the last value here, i.e. 0.0, is from stumpy.stump)
From naive.stump. If you look at my last output, you should NOT be looking at 12 0.0, which is the self-match/trivial match. Instead, you should be looking at 42 2.950287948697726e-05
In my local repo, I got: 12 2.950287948697726e-05 2.950306866348452e-05.
This suggests that stumpy.stump is different and not returning 0.0 for some reason
From
naive.stump. If you look at my last output, you should NOT be looking at12 0.0, which is the self-match/trivial match. Instead, you should be looking at42 2.950287948697726e-05
Just got that! You are right!
In my local repo, I got: 12 2.950287948697726e-05 2.950306866348452e-05.
This suggests that
stumpy.stumpis different and not returning0.0for some reason
That is probably correct. I am going to do what you suggested... in the meantime, could you please remove @njit decorator from stump and provide me with the results of:
for i, (ref, cmp) in enumerate(zip(ref_mp[:, 0], comp_mp[:, 0])):
if np.abs(ref - cmp) > 1e-7:
print(i, ref, cmp)
?
Maybe numba performs differently on different platforms... btw, the code in this branch -in my local repo- still fails for seed=26 !!!
Do you feel we are putting too much energy on this? I mean....how do you feel about it? :)
2. an you verify what the pairwise distances are for
idx = 12? Here are my pairwise distances generated fromnaive.stump:
0 3.0000184482883
1 3.689691694550767e-05
2 2.999981187305404
3 3.0000188123406852
4 3.807877925156031e-05
5 0.9716146011928827
6 3.365385804112601
7 3.00001930941092
8 3.0000184482883
9 3.689691694550766e-05
10 2.7130038423554645
11 1.293751201679764
12 0.0
13 2.9999872380440378
14 3.0000368085323212
15 1.9609012370025143
16 2.999981551371355
17 2.644560505532764
18 0.4142995718310496
19 3.5963320995608045e-05
20 2.9999819288490794
21 3.0000186910572095
22 3.743336396799641e-05
23 2.9999807857662883
24 3.0000192138645323
25 3.0000184482883
26 3.689691694550767e-05
27 3.660682091553413e-05
28 2.9999816124248677
29 3.0000181726311808
30 0.6555850486723159
31 2.8129002184921874
32 3.373545663525392
33 3.9431652231321966e-05
34 2.9999799194833465
35 3.0000200801134347
36 2.979568418597036
37 3.686034742708289e-05
38 2.999981569656451
39 1.735512606848563
40 2.999981270234494
41 3.000017938214321
42 2.950287948697726e-05
43 0.3834710886174519
44 0.7569603930026928
45 3.342379253523891
46 3.000018690647586
47 4.028403188967316e-05
48 2.8729632528630487
49 0.27224456911612394
50 2.7210340045887635
51 3.2311659747454886
52 0.7619746520574434
53 2.9999819565270847
54 3.0000184482882997
55 3.689691694550767e-05
56 2.9999785390824387
57 0.3461178433786073
58 2.9999820733160356
59 2.316949684170337
60 2.9999817367381696
61 3.0000184482883
I think they are the same. So, as you suggested, this matter is related to stumpy.stump
in the meantime, could you please remove @njit decorator from stump and provide me with the results of:
After commenting out @njit for _stump and _compute_diagonal, I got:
12 2.950287948697726e-05 0.0
13 1.0329297510446037e-05 0.0
19 6.434999199613844e-07 0.0
22 5.364470225196175e-07 0.0
33 8.523796584076095e-07 0.0
35 1.5414353887031432e-06 0.0
40 1.658550663133749e-07 0.0
60 2.486220486843771e-07 0.0
Do you feel we are putting too much energy on this? I mean....how do you feel about it? :)
I do not feel this way at all! This shouldn't be happening and it would be good to track it down
I think they are the same. So, as you suggested, this matter is related to
stumpy.stump
The good thing is that we've isolated to the idx=12 and idx=42 pair. Somehow, your stump is computing 2.950306866348452e-05, which is very close to the same answer as naive.stump. Note that we are seeing the same test failures in Github Actions but all of them are only failing (first) for Ubuntu and Mac OS. Can you check if it fails for Windows in Github Actions (i.e., does it get past `test_stump.py for Windows)?
I think they are the same. So, as you suggested, this matter is related to
stumpy.stumpThe good thing is that we've isolated to the
idx=12andidx=42pair. Somehow, yourstumpis computing2.950306866348452e-05, which is very close to the same answer asnaive.stump. Note that we are seeing the same test failures in Github Actions but all of them are only failing (first) for Ubuntu and Mac OS.
Can you check if it fails for Windows in Github Actions (i.e., does it get past `test_stump.py for Windows)?
I think it passes
============================= test session starts =============================
platform win32 -- Python 3.10.6, pytest-7.1.3, pluggy-1.0.0
rootdir: D:\a\stumpy\stumpy, configfile: pytest.ini
collected 87 items
tests\test_stump.py ........................... [ 31%]
tests\test_mstump.py ............................................ [ 81%]
tests\test_stumpi.py ................ [100%]
============================= 87 passed in 29.12s =============================
The operation got cancelled for other versions of python.
in the meantime, could you please remove @njit decorator from stump and provide me with the results of:
After commenting out
@njitfor_stumpand_compute_diagonal, I got:12 2.950287948697726e-05 0.0 13 1.0329297510446037e-05 0.0 19 6.434999199613844e-07 0.0 22 5.364470225196175e-07 0.0 33 8.523796584076095e-07 0.0 35 1.5414353887031432e-06 0.0 40 1.658550663133749e-07 0.0 60 2.486220486843771e-07 0.0
And I got:
12 2.950287948697726e-05 0.0
13 1.0329297510446037e-05 0.0
19 6.434999199613844e-07 0.0
22 5.364470225196175e-07 0.0
33 8.523796584076095e-07 0.0
35 1.5414353887031432e-06 0.0
40 1.658550663133749e-07 0.0
47 8.523796584076095e-07 0.0
56 2.760744472197097e-06 0.0
60 2.486220486843771e-07 0.0
Now, without Numba, the values that correspond to idx=12 are the same as what you got. (Note that we STILL use numba to get mean and std. So, MAYBE that is the reason behind the discrepancy between your result and mine) But, this can show that something is happening in numba. Right?
@seanlaw
It might be a good idea to create a new branch and just have a directory tests that only contains test_stump with only the test function of our interest. I think this can help us detect the problem more quickly and come up with a MWE if needed.
Codecov Report
Base: 99.89% // Head: 99.89% // Increases project coverage by +0.00% :tada:
Coverage data is based on head (
49651ff) compared to base (cd9c3ff). Patch coverage: 100.00% of modified lines in pull request are covered.
Additional details and impacted files
@@ Coverage Diff @@
## main #668 +/- ##
=======================================
Coverage 99.89% 99.89%
=======================================
Files 80 80
Lines 11453 11538 +85
=======================================
+ Hits 11441 11526 +85
Misses 12 12
| Impacted Files | Coverage Δ | |
|---|---|---|
| stumpy/scrump.py | 100.00% <ø> (ø) |
|
| stumpy/config.py | 100.00% <100.00%> (ø) |
|
| stumpy/core.py | 100.00% <100.00%> (ø) |
|
| stumpy/stump.py | 100.00% <100.00%> (ø) |
|
| stumpy/stumped.py | 100.00% <100.00%> (ø) |
|
| tests/naive.py | 100.00% <100.00%> (ø) |
|
| tests/test_core.py | 100.00% <100.00%> (ø) |
|
| tests/test_stump.py | 100.00% <100.00%> (ø) |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I get:
12 2.950287948697726e-05 0.0
Could you please let me know what is the nn of 12 in your stumpy.stump? (I think you are using Mac. right?)
Explanation
In naive, the nn is 42. In my local repo, I got 42 as well when I used stumpy.stump. So, I used the subseqs at indices 12 and 42 to create a new time series T that is used in the new test function that I pushed. However, it seems all tests are passing.
So, maybe the T that I used is not a good input to reproduce that error. Or, maybe 42 is NOT the nn of 12 in your end! That is why I asked if you could let me know the nn index of subseq at index 12.
Could you please let me know what is the nn of 12 in your stumpy.stump? (I think you are using Mac. right?)
When I do:
comp_mp[12]
I get:
array([0.0, 9, 9, 42], dtype=object)
So, stumpy.stump (with @njit ON) produces a left and global nearest neighbor of 9 while the right nearest neighbor is 42
When I print their z-norm subsequences, I get:
# idx = 9
stumpy.core.z_norm(T[9:9+m])
array([-0.70710678, -0.70710678, 1.41421356])
# idx = 12
stumpy.core.z_norm(T[12:12+m])
array([-0.70713287, -0.70708069, 1.41421356])
# idx = 42
stumpy.core.z_norm(T[42:42+m])
array([-0.70711201, -0.70710155, 1.41421356])
You can see that they are somewhat close. The last (3rd) element is identical but the first two elements are only slightly different. stumpy.stump is interpreting idx = 9 as being essentially identical to idx = 12
@seanlaw
We just got an error caused by slight imprecision in identical case in a about-to-be-merged PR. (see error)
I was wondering if we should partially (?) solve this as this error appears from time to time in the merging process. I have a new idea but have not implemented it yet. The concept goes as follows:
if pearson > pearson_threshold:
# S_A: z-norm of subseq in T_A
# S_B: z-norm of subseq in T_B
if np.all(S_A == S_B):
pearson = 1.0
Do you think the np.all(...) can be time consuming?
Do you think the np.all(...) can be time consuming?
Instead of np.all(S_A == S_B), I think you'd rather use np.allclose (see docs). The S_A == S_B can be expensive because you are forced to always compare EVERY element in S_A with S_B. I'd be willing to bet that np.allclose short circuits and stops checking if the first elements are not equal and is therefore faster.
Regarding the timing, I don't know how time consuming it would be and we should simply check with some different inputs. First:
T_random = np.random.rand(100_000)
then:
t = np.linspace(0, 50*(2*np.pi), 100_000)
T_sine = np.sin(t) # Essentially, a perfect sine wave repeated 50 times so the check above is very costly
Finally, a mix of both:
percent = 0.1
T_random_sine = np.where(T_random < percent, T_random, T_sine) # Essentially, insert random values only a `percent` of the time
Compare their performance with the check.
I'd be willing to bet that np.allclose short circuits and stops checking if the first elements are not equal and is therefore faster.
Hmm, according to this, np.allclose uses all(A == B) behind the scenes. I wonder what numba has implemented behind the scenes. Otherwise, we might want to implement the function ourselves in core.py but that knows to short circuit at the sign of the first mismatch.
@seanlaw Thanks for sharing the links. I took a look at them.
Hmm, according to this,
np.allcloseusesall(A == B)behind the scenes
I think you are right. I took a look at the source code and I think they use np.all. I will update you after I do some work on this :)
side note: It might be worthwhile to note that even small error in $\rho$ can result in large error in z-normalized Euclidean distance.
Example:
m = 3 # size of window
ρ_exact = 0.99999999 --> dist_exact = 0.0002
ρ_with_imprecision = 0.99999998 --> dist_with_imprecision = 0.0003
It might help us later with understanding the imprecisions.
It might be worthwhile to note that even small error in can result in large error in z-normalized Euclidean distance.
Hmm, should we have a config.STUMPY_RHO_THRESHOLD instead of the config.STUMPY_CORRELATION_THRESHOLD? Both?
should we have a
config.STUMPY_RHO_THRESHOLDinstead of theconfig.STUMPY_CORRELATION_THRESHOLD?
Did you mean config.STUMPY_DISTANCE_THRESHOLD? (I mean.... aren't RHO and CORRELATION the same thing?)
Also, note that the following numbers
Example:
m = 3 # size of window ρ_exact = 0.99999999 --> dist_exact = 0.0002 ρ_with_imprecision = 0.99999998 --> dist_with_imprecision = 0.0003
depend on m. You can see that if we change m from 3 to 100, we will have:
> m = 100 # size of window
> ρ_exact = 0.99999999 --> dist_exact = 0.00144
> ρ_with_imprecision = 0.99999998 --> dist_with_imprecision = 0.00199
Now, it is worse!
So, MAYBE we should set a threshold for pearson, then calculate its corresponding threshold for distance and use it in naive version (to set the distance value below threshold to 0) just for the sake of unit testing.
Did you mean config.STUMPY_DISTANCE_THRESHOLD? (I mean.... aren't RHO and CORRELATION the same thing?)
Oops, you're right!
So, MAYBE we should set a threshold for pearson, then calculate its corresponding threshold for distance and use it in naive version (to set the distance value below threshold to 0) just for the sake of unit testing.
Doesn't this mean that config.STUMPY_CORRELATION_THRESHOLD is too small? Maybe we set it to 1 - 1e-06 instead?
Can you please clarify if the issue is in the naive or performant version?
Doesn't this mean that
config.STUMPY_CORRELATION_THRESHOLDis too small? Maybe we set it to1 - 1e-06instead?Can you please clarify if the issue is in the naive or performant version?
I think I tried 1 - 1e-06 once but did not help. I will try again. I will also investigate to find the main cause of discrepancy between naive and performant versions.
Did you mean config.STUMPY_DISTANCE_THRESHOLD? (I mean.... aren't RHO and CORRELATION the same thing?)
Oops, you're right
I am thinking about adding something like config.STUMPY_DISTANCE_THRESHOLD=1e-3. I will push commits when I figure things out and then you can see the changes.
config.STUMPY_DISTANCE_THRESHOLD=1e-3
What is this for? What will it be used to control or compare against? Where would you use it in the codebase?
config.STUMPY_DISTANCE_THRESHOLD=1e-3
What is this for? What will it be used to control or compare against? Where would you use it in the codebase?
So, this is what I think:
Maybe, we should just set pearson to 1.0 when it exceeds a threshold, instead of re-calculating it. Then, we create a new config variable config.STUMPY_DISTANCE_THRESHOLD which is a function of m and the config.STUMPY_PEARSON_THRESHOLD. For instance, if I set config.STUMPY_PEARSON_THRESHOLD to 1-1e-6 and m is 3, then the corresponding config.STUMPY_DISTANCE_THRESHOLD becomes 0.077459. So, we can use it to modify the distance values obtained by naive version. If it becomes below 0.077459, we set it to 0.0.
Then, we compare mp of naive and mp of performant.
(I have not implemented this yet)
Maybe, we should just set pearson to 1.0 when it exceeds a threshold, instead of re-calculating it. Then, we create a new config variable config.STUMPY_DISTANCE_THRESHOLD which is a function of m and the config.STUMPY_PEARSON_THRESHOLD. For instance, if I set config.STUMPY_PEARSON_THRESHOLD to 1-1e-6 and m is 3, then the corresponding config.STUMPY_DISTANCE_THRESHOLD becomes 0.077459. So, we can use it to modify the distance values obtained by naive version. If it becomes below 0.077459, we set it to 0.0.
I don't know. This config.STUMPY_DISTANCE_THRESHOLD feels wrong. Additionally, I'm not seeing how you are arriving at 0.077459 from m = 3 and config.STUMPY_PEARSON_THRESHOLD = 1-1e-6.
What I'm observing is that even when config.STUMPY_PEARSON_THRESHOLD = 1-1e-8 (and m = 3), the z-norm distance is still not close to zero.
import math
p = 1-1e-8
m = 3
math.sqrt(2 * m * (1 - p)) # 0.00024494897489372265
0.00024494897489372265 is definitely not zero