#190

Successfully reproduces the short paper, MP.XVII: Indexing the Matrix Profile for Arbitrary Range Queries. There is an issue where a few (0-10) distances will be incorrect (compared to STUMP over the same arbitrary range), the figures were accurately reproduced, though (except for the baseline MP computed by both STUMP and ARIMP, small differences from paper shown in notebook.)

Does not reproduce the extended paper. There are a few issues:

The dataset for the second case study was sampled hourly when the paper accessed it (2019-6-12), but is now sampled daily. The data is too imprecise to reproduce any of the results. The case study tests the relative-inclusive range query.
The dataset for the third case study is similar to but different from the paper's data. After trying every combination of interpolation, filling NaN and other down-sampling on both datasets (there is a 'clean' and a 'processed' dataset), still no way of reproducing the paper's data. It's similar, though. power_data.csv power_data1.csv
MP and top motifs differ from the paper. Seems to be from the dataset.
There's an issue with the ARIMP code, relative/absolute-inclusive queries. Getting 208 different distances out of 14,494 calculated by AAMP & STUMP. And there is still the bug with exclusive queries where 0-10 distances (4 for this dataset/window) will differ from STUMP's MP.

I'm going to look through the code later to try to find the inclusive query error as it seems that, other than the discrepancy in the datasets, the ARI-MP is functioning. Added non-normalized-MP initialization to ARIMP because the paper's MP looks closer to it.

Jan 16 '22 19:01 dylanjprice

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

Jan 16 '22 19:01 review-notebook-app[bot]

Thanks @dylanjprice! Would you mind moving this file into the docs/ directory where all of the notebooks are currently stored? I haven't gone over the paper in detail but please let me know how I can assist or when you're ready for it to be reviewed!

Jan 16 '22 19:01 seanlaw

Codecov Report

Merging #519 (1495310) into main (882f90a) will not change coverage. The diff coverage is n/a.

@@           Coverage Diff           @@
##             main     #519   +/-   ##
=======================================
  Coverage   99.89%   99.89%           
=======================================
  Files          80       80           
  Lines       11300    11300           
=======================================
  Hits        11288    11288           
  Misses         12       12

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 882f90a...1495310. Read the comment docs.

Jan 16 '22 20:01 codecov-commenter

Sorry, renamed the branch and closed it. I'll try to get the inclusive query fixed before a review, it's pretty close though. Might need help finding the error, it's odd that only a few distances are different from STUMP's MP.

Jan 16 '22 20:01 dylanjprice

Might need help finding the error, it's odd that only a few distances are different from STUMP's MP.

@dylanjprice Sounds good. I am here to help!

Jan 16 '22 22:01 seanlaw

Successfully reproduces the short paper, MP.XVII: Indexing the Matrix Profile for Arbitrary Range Queries. There is an issue where a few (0-10) distances will be incorrect (compared to STUMP over the same arbitrary range), the figures were accurately reproduced,

@dylanjprice I know that this is still in progress (so no rush) but I was wondering if you were planning to include this "accurately reproduced" result in the notebook? I have some thoughts of my own but I'd love to see how you made these comparisons and validated the results.

Jan 20 '22 20:01 seanlaw

Yep I'll have everything included, accurately reproduced or not. There's also an annotated copy of the code with the paper's algorithms next to the relevant code in 'arimp.html' as well to show thought process. Almost got the bug figured out! I'll let you know as soon as it's ready.

Jan 20 '22 23:01 dylanjprice

Hey @seanlaw , take a look! There were some discrepancies in the datasets and stumpy.stump's MPs from the paper's; it's all noted in the notebook. The query function appears to work exactly, now, fixed the error. Let me know what you think!

Jan 21 '22 22:01 dylanjprice

There were some discrepancies in the datasets and stumpy.stump's MPs from the paper's; it's all noted in the notebook. The query function appears to work exactly, now, fixed the error. Let me know what you think!

@dylanjprice Thank you. Please give me some time to take go over it.

Jan 21 '22 23:01 seanlaw

@dylanjprice In the notebook, would it be possible to split the first cell of the notebook into separate cells where the class is in its own cell and the individual functions are also in their own cell. This will make it easier to provide focused feedback within the reviewnb environment (see big purple button above).

Jan 22 '22 14:01 seanlaw

My bad, just split it. If there's anything else you need for readability/feedback let me know

Jan 22 '22 14:01 dylanjprice

@dylanjprice If it's okay with you, I would like to start the first round(s) of feedback focused around the generation of the "arbitrary range indices". While you incorporate those comments, I will find some time to re-read the querying part of the paper. Would that work for you?

Also, please let me know if you have limited time to pursue this further and you would prefer to simply merge the PR and hand things off to me from here. At initial glance, it does look like your work is reproducing the figures in the paper sufficiently well (aside from some data differences) and I would be happy to take it forward. I know that I can be quite particular and I greatly appreciate this contribution and so I leave this decision up to you.

Jan 23 '22 18:01 seanlaw

I can absolutely improve it to the library's standards if you don't mind taking time to provide the rounds of feedback. It will also be a huge help to me, because I don't work in a professional coding environment, to see first-hand the particulars of writing proper code. If it appears like it's coming together to your standards I can follow it through to the end, or at a later time you can take over if you think it's necessary. Feel free to modify any parts you need, otherwise I can find the time to incorporate anything commented and appreciate any and all feedback you can provide.

Jan 23 '22 20:01 dylanjprice

I can absolutely improve it to the library's standards if you don't mind taking time to provide the rounds of feedback.

I am 100% here for this and I would be more than happy to collaborate!

I don't work in a professional coding environment, to see first-hand the particulars of writing proper code

I certainly don't know everything either and I can see that you know what you are doing. So, let's learn together!

Given that we currently have the code in the notebook, I wanted to warn you that reviewnb isn't without its flaws as, in the past, comments can disappear if you push a new commit. So, please make sure to:

Respond to all reviewnb comments inline within reviewnb with something like "Sounds good. I'll handle this in the next commit" once the discussion is clear to you. This way, I can mark those comments as "Resolved"
Once all reviewnb comments are "Resolved", please leave a message here in the main PR with something like "If you have no further comments, I'm ready to work on/push the next set of commits". And then please wait until I will respond in the affirmative before pushing any new commits.
Never be too shy to ask for clarifications or debate/discuss any choices (even if you feel that it may be trivial). At the end, users will benefit from a well thought out and thoughtfully designed API.

This way, we can stay coordinated and not lose too many conversations due to the limitations of reviewnb. How does that sound?

For this PR, let's continue to leave everything in the notebook until we iron out all of the kinks. I'll eventually ask you to remove some files like the images and csv files before the final merge but let's not worry about that until we figure everything out. After this PR is merged, we can create a new PR to start migrating contents of the notebook over to its own Python module and that's when we'll handle unit testing as well.

Do you have any questions, comments, or concerns?

Jan 23 '22 20:01 seanlaw

Ok, sounds great! No questions as of yet; if there's a question/discussion about a comment, I would ask here rather than inline, right? Otherwise that all sounds good, ready to get started anytime.

Jan 23 '22 21:01 dylanjprice

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2022-01-23T22:41:32Z ----------------------------------------------------------------

@dylanjprice This is what I'm referring to as an "inline comment" (i.e., comments that are inlined with code/markdown within the notebook). This will appear in the main PR but in actuality, this was written within reviewnb. So, to get to this comment:

Scroll to the top of the page of the main PR
Click on big purple "ReviewNB" button
Then, at the top, click on "docs/arimp.ipynb" which will reveal this notebook along with all of the inline comments

So, for completely general questions (i.e., questions that are unrelated to sections/cells of the notebook), please post in the main PR and not inline. However, when referring to sections/cells of the notebook, please post inline of the ReviewNB environment. Let me know if this makes sense and please feel free to try it out.

dylanjprice commented on 2022-01-24T01:19:54Z ----------------------------------------------------------------

Ah gotcha, sounds good. Makes sense. Completely unrelated, it looks like the images were resized differently from my notebook... they were originally equal to the matlibplots, sorry about that annoyance.

seanlaw commented on 2022-01-24T01:27:15Z ----------------------------------------------------------------

No worries. All good here!

Jan 23 '22 22:01 review-notebook-app[bot]

Ah gotcha, sounds good. Makes sense. Completely unrelated, it looks like the images were resized differently from my notebook... they were originally equal to the matlibplots, sorry about that annoyance.

View entire conversation on ReviewNB

Jan 24 '22 01:01 dylanjprice

No worries. All good here!

View entire conversation on ReviewNB

Jan 24 '22 01:01 seanlaw

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2022-01-25T02:52:42Z ----------------------------------------------------------------

I want to preface this comment by clearly acknowledging that your choice of using a class is 100% correct for object oriented Python programming. However, in STUMPY, we've purposely made a design choice to keep things as flat functions whenever possible (think composable, modular NumPy functions rather than sklearn style fit-predict classes) and we've reserved classes exclusively for when it is important to keep track of some continually changing "internal state" (e.g., when we need to iterate or update the state incrementally in stumpy.scrump, stumpy.stumpi , or stumpy.stimp ). The reasoning is that STUMPY tries to provide basic components/building blocks that would allow users the flexibility to deviate from the norm when our API does not account for their use case. Philosophically, we think of STUMPY as being the foundation for building matrix profile applications and we try to focus on computing the matrix profile (or the "arbitrary range indices") as efficiently/accurately as possible, hand that result back to the user, and then get out of their way. We try very hard to "let the developers code" and we try very hard "not to be everything for everyone" (e.g., we purposely avoid offering tools for data preprocessing or data visualization). In this process, our goal is to make as few assumptions as possible while ensuring that we don't do anything that would allow the user to shoot themselves in the foot. <getting off my soapbox now :)>

In the case of "arbitrary range queries" (and similar to the structure of your class), I envision offering only two simple public API functions to the user: stumpy.ari ("arbitrary range indices") and stumpy.arq ("arbitrary range query"). So, the user is able to generate the indices completely independently/separately from querying (where stumpy.arq accepts the output of stumpy.ari as input). Providing a functional API has some additional benefits in that the indices may possibly be used for other purposes in the future (i.e., if the researchers have additional applications that leverage this data structure) and the indices can be saved off into, say, database storage). Anecdotally, this design choice has made it very, very easy to grow STUMPY over the years and we try to limit our public API functions. Of course, users seem to appreciate that they can reach into the toolbox and mix-and-match as they see fit rather than being limited by our public API offerings though our private functions are not technically supported and can change without warning.

dylanjprice commented on 2022-01-27T01:33:42Z ----------------------------------------------------------------

Hey, I've been busy at home, going to take a look at this tomorrow though. That sounds great and I completely appreciate and agree with the modularity of STUMPY, this ARI notebook came about because of some original messing around with the private functions I was doing for other projects. Let me know if there is a specific data structure design you want to use for the arb range indexes/distances; right now, it's just a tuple of Numba lists. The lists seemed unavoidable due to the inability to foresee each ranged index length, the large potential space overhead and the widely different index lengths. I can rearrange the query function to accept the full tuple.

seanlaw commented on 2022-01-27T02:06:35Z ----------------------------------------------------------------

> Hey, I've been busy at home, going to take a look at this tomorrow though.

Absolutely no rush on my end. I prefer quality over speed and I know that it's important to take care of your own needs first. This is all volunteer work so we are grateful that you chose to contribute to our cause!

> That sounds great and I completely appreciate and agree with the modularity of STUMPY, this ARI notebook came about because of some original messing around with the private functions I was doing for other projects. Let me know if there is a specific data structure design you want to use for the arb range indexes/distances; right now, it's just a tuple of Numba lists. The lists seemed unavoidable due to the inability to foresee each ranged index length, the large potential space overhead and the widely different index lengths. I can rearrange the query function to accept the full tuple.

I had seen the numba.typed.List before but had never used it as it was still "experimental" but I think you've made a wonderful choice! I've checked with one of the core developers of numba and they are pretty confident that typed.List are here to stay so we should be good. It is the right data structure for these jagged/ragged arrays. I have some thoughts and ideas to help further iterate/build upon what you've done here but I want to give you time to incorporate things into the foundation that I've proposed above. Otherwise, I'll be throwing way too many things at you at the same time and I want to avoid overwhelming you! :)

Jan 25 '22 02:01 review-notebook-app[bot]

View / edit / reply to this conversation on ReviewNB

seanlaw commented on 2022-01-25T02:52:43Z ----------------------------------------------------------------

Here are a few high level comments:

Ignore normalize=False for now since we usually handle the normalize=True case first (the non-normalized code is actually separated in order to keep the code more maintainable)
We need to keep in mind that there are time series with np.nan/np.inf and we handle this in STUMPY by calling core.preprocess on the time series and also take a look at the work that core.mass and core.mass_absolute does to take this into account. Currently, core._mass and core._mass_absolute assume that you have performed the necessary preprocessing of your inputs (for np.inf/np.nan ) and that you'll take care to post-process them accordingly
Additionally, be very careful when setting fastmath=True as this assumes that your inputs/outputs and any math operations performed within the function do NOT contain np.nan/np.inf. See fast math flags. When you do need to perform math with np.nan/np.inf values, you'll want to change this to something like more specific like: fastmath={"nsz", "arcp", "contract", "afn", "reassoc"}
Whatever we do for the base case, it needs to be written in a way that will require no re-writes for the distributed (Dask) version of the code (i.e., both single server and multi-server should utilize the same internals)

So, we have an older version of stumpy.stump that computes the matrix profile by traversing the distance matrix one row at a time but that is parallelizable (via numba) and distributable (via dask ) but, more importantly, it has many of the safeguards in place to check the inputs before processing. I've had to dig back into historical STUMPY commits (May 2020) to resurrect it (as the current implementation is actually around 20% faster) but I've cleaned it up and I'm hoping that you can go over it and build upon this foundation:

Original Stump/Stumped Code

I've purposely named the functions ari and aried for you but they are both currently computing matrix profiles and not arbitrary range indices. You'd need to focus on the _ari function for incorporating your code. Let's start here and please let me know if you have questions, comments, or concerns.

dylanjprice commented on 2022-01-27T01:59:17Z ----------------------------------------------------------------

Okay sounds good. Keep the trivial=True and AB-join functionality though, right?
Right, I'll add the preprocess in. A couple of other things from STUMPY were skimmed over, too, that I'll insert (exclusion range from config comes to mind.) edit: Okay I see the original Stump/Stumped link, I'll check that out instead of manually inserting any preprocessing or config constants or safeguards.
Haha you know, I read the fastmath flags, and read that it said, "will assume arguments or results are not NaN/inf. . ." for some reason thought it meant the python function's arguments and returned results, not the actual fastmath sub-functions being run. Whoops. I'll take out the inf flag. Is it necessary to remove the NaN flag if the data is being preprocessed?
Got it. I've taken a look at stumpy.stumped, I'll look through the other distributed functions to get an idea of the distributed design pattern being used.

I'll take a look and incorporate the stump code, and let you know how things are turning out. Thanks for all of the feedback so far

seanlaw commented on 2022-01-27T02:14:21Z ----------------------------------------------------------------

Okay sounds good. Keep the trivial=True and AB-join functionality though, right?

Yes, if you look at the code I posted, it should already handle the exclusion zones and AB-joins for you (and it is parallelized for numba) and so I would add to and build on top of that

Right, I'll add the preprocess in. A couple of other things from STUMPY were skimmed over, too, that I'll insert (exclusion range from config comes to mind.) edit: Okay I see the original Stump/Stumped link, I'll check that out instead of manually inserting any preprocessing or config constants or safeguards.

Cool!

Haha you know, I read the fastmath flags, and read that it said, "will assume arguments or results are not NaN/inf. . ." for some reason thought it meant the python function's arguments and returned results, not the actual fastmath sub-functions being run. Whoops. I'll take out the inf flag. Is it necessary to remove the NaN flag if the data is being preprocessed?

Frankly, the docs aren't that clear but I would stick to usingfastmath={"nsz", "arcp", "contract", "afn", "reassoc"} since the input time series can contain np.nan values

Got it. I've taken a look at stumpy.stumped, I'll look through the other distributed functions to get an idea of the distributed design pattern being used.

Please do not hesitate to ask me any questions! I am here to help

Jan 25 '22 02:01 review-notebook-app[bot]

@dylanjprice I've left some initial comments for you to consider. Please take a look at your earliest convenience.

Jan 25 '22 15:01 seanlaw

Hey, I've been busy at home, going to take a look at this tomorrow though. That sounds great and I completely appreciate and agree with the modularity of STUMPY, this ARI notebook came about because of some original messing around with the private functions I was doing for other projects. Let me know if there is a specific data structure design you want to use for the arb range indexes/distances; right now, it's just a tuple of Numba lists. The lists seemed unavoidable due to the inability to foresee each ranged index length, the large potential space overhead and the widely different index lengths. I can rearrange the query function to accept the full tuple.

View entire conversation on ReviewNB

Jan 27 '22 01:01 dylanjprice

Okay sounds good. Keep the trivial=True and AB-join functionality though, right?
Right, I'll add the preprocess in. A couple of other things from STUMPY were skimmed over, too, that I'll insert (exclusion range from config comes to mind.) edit: Okay I see the original Stump/Stumped link, I'll check that out instead of manually inserting any preprocessing or config constants or safeguards.
Haha you know, I read the fastmath flags, and read that it said, "will assume arguments or results are not NaN/inf. . ." for some reason thought it meant the python function's arguments and returned results, not the actual fastmath sub-functions being run. Whoops. I'll take out the inf flag. Is it necessary to remove the NaN flag if the data is being preprocessed?
Got it. I've taken a look at stumpy.stumped, I'll look through the other distributed functions to get an idea of the distributed design pattern being used.

I'll take a look and incorporate the stump code, and let you know how things are turning out. Thanks for all of the feedback so far

View entire conversation on ReviewNB

Jan 27 '22 01:01 dylanjprice

> Hey, I've been busy at home, going to take a look at this tomorrow though.

Absolutely no rush on my end. I prefer quality over speed and I know that it's important to take care of your own needs first. This is all volunteer work so we are grateful that you chose to contribute to our cause!

> That sounds great and I completely appreciate and agree with the modularity of STUMPY, this ARI notebook came about because of some original messing around with the private functions I was doing for other projects. Let me know if there is a specific data structure design you want to use for the arb range indexes/distances; right now, it's just a tuple of Numba lists. The lists seemed unavoidable due to the inability to foresee each ranged index length, the large potential space overhead and the widely different index lengths. I can rearrange the query function to accept the full tuple.

I had seen the numba.typed.List before but had never used it as it was still "experimental" but I think you've made a wonderful choice! I've checked with one of the core developers of numba and they are pretty confident that typed.List are here to stay so we should be good. It is the right data structure for these jagged/ragged arrays. I have some thoughts and ideas to help further iterate/build upon what you've done here but I want to give you time to incorporate things into the foundation that I've proposed above. Otherwise, I'll be throwing way too many things at you at the same time and I want to avoid overwhelming you! :)

View entire conversation on ReviewNB

Jan 27 '22 02:01 seanlaw

Okay sounds good. Keep the trivial=True and AB-join functionality though, right?

Yes, if you look at the code I posted, it should already handle the exclusion zones and AB-joins for you (and it is parallelized for numba) and so I would add to and build on top of that

Right, I'll add the preprocess in. A couple of other things from STUMPY were skimmed over, too, that I'll insert (exclusion range from config comes to mind.) edit: Okay I see the original Stump/Stumped link, I'll check that out instead of manually inserting any preprocessing or config constants or safeguards.

Cool!

Haha you know, I read the fastmath flags, and read that it said, "will assume arguments or results are not NaN/inf. . ." for some reason thought it meant the python function's arguments and returned results, not the actual fastmath sub-functions being run. Whoops. I'll take out the inf flag. Is it necessary to remove the NaN flag if the data is being preprocessed?

Frankly, the docs aren't that clear but I would stick to usingfastmath={"nsz", "arcp", "contract", "afn", "reassoc"} since the input time series can contain np.nan values

Got it. I've taken a look at stumpy.stumped, I'll look through the other distributed functions to get an idea of the distributed design pattern being used.

Please do not hesitate to ask me any questions! I am here to help

View entire conversation on ReviewNB

Jan 27 '22 02:01 seanlaw

So, I've tried out the original STUMP/STUMPED code, there's an easy implementation but it might not be in the style that would fit properly in the library. I can push a commit if you'd like, wasn't sure if the inline comments should be resolved first. Here are some questions/comments on the implementation:

The lists can be: A. Initialized in the "ari" wrapper function, modified in-place by the "_ari" computation for first and rest of distances; B. Initialized in the "ari" wrapper, then run the set of four minimum-so-far finders to create the first list of each of the eight list-of-lists, then add the rest of the lists via "_ari"; C. Initialize the lists in the "_ari" function and compute the first and rest of the distances, then return the full set to the ari wrapper.

Maybe this is a trivial question, but I wasn't sure what you have in mind with regard to taking out the first distance calculation, changing the output structure and adding the lists. I'm worried to edit the original code too much, haha.
The min-so-far code was repetitive so now it's just four pranges that call a _minimum_so_far function. That won't affect performance, right?
In the distributed function, should the workers be split further to calculate these separate indexes or are the pranges enough? If the CPUs should be split further, I think a separate _aried function would need to be created.

Is this is all in the direction you were thinking? Waiting to see what you think before I finish the code on this commit.

Jan 29 '22 05:01 dylanjprice

@dylanjprice Excellent questions. Please give me a little time to formulate a proper response to your points above.

wasn't sure if the inline comments should be resolved first

If the inline comments are clear then please leave a comment like "Got it. I will resolve this in the next commit". Otherwise, please respond inline if you need further clarification. This way, it lets me know that you've at least read/considered the comments and I am not holding you back.

Jan 29 '22 13:01 seanlaw

Maybe this is a trivial question, but I wasn't sure what you have in mind with regard to taking out the first distance calculation, changing the output structure and adding the lists. I'm worried to edit the original code too much, haha.

So, the basic logic of taking out the first distance calculation was to ensure that we remove as many if/else branching statements as possible inside of _ari and to help keep things flat where possible. This then means that _ari function becomes a bit easier to read and, more importantly, should be more consistent with the original STOMP/GPU-STOMP papers. Additionally, this means that we don't need to test the branching logic because there is no branching. Similar to how the matrix profile out is initialized in ari, I recommend also doing the same thing with your list of indices SL, SLI etc. However, instead of continuously appending to a list (which I feel is unsafe/imprecise - because you can miscount and continually append), I recommend defining a list with an exact size in ari and then you can update direct indices of your list at runtime. Here is an oversimplified example that I hope will convey my thinking:

from numba.typed import List
from numba import njit
import numpy as np

@njit(fastmath=True)
def _get_indices(D):
    i = np.random.randint(0, len(D))
    return D[:i], np.arange(i)


@njit(fastmath=True)
def _ari(SL, SLI, start, stop):
    for i in range(start, stop):
        D = np.random.rand(100)
        SL[i], SLI[i] = _get_indices(D)


def ari(T):
    start = 0
    stop = T.shape[0]
    SL = List([np.empty(0, dtype=np.float64) for _ in range(stop)])
    SLI = List([np.empty(0, dtype=np.int64) for _ in range(stop)])

    # Update first index
    D = np.random.rand(100)
    SL[0], SLI[0] = _get_indices(D)

    # Update all other indices
    _ari(SL, SLI, start + 1, stop)
    
    return SL, SLI

    
if __name__ == "__main__":
    T = np.random.rand(1_000_000)
    SL, SLI = ari(T)
    for i in range(5):
        print(SL[i], SLI[i])

So, to answer your first question, I think it makes sense to have a single function whose sole responsibility is to blindly accept a numpy array and then return ALL of the arbitrary range indices. And, sure, this function is allowed to call other sub-functions (set inline="always" as a numba flag) and it can even use parallel=True along with prange. Please let me know if that makes sense.

The min-so-far code was repetitive so now it's just four pranges that call a _minimum_so_far function. That won't affect performance, right?

I'm not sure. Let's worry about it after you've implemented it.

In the distributed function, should the workers be split further to calculate these separate indexes or are the pranges enough? If the CPUs should be split further, I think a separate _aried function would need to be created.

Maybe I'm not understanding your question but we need to keep in mind that "distributed" is referring to the fact that we are sending instructions for a subset of the work to separate servers. So in (_aried):

for i, start in enumerate(range(0, l, step)):
        stop = min(l, start + step)

This for-loop is splitting up the indices so that each worker will be tasked with computing only a subset of the distances. Specifically, the distances between [start + 1: stop], which will increment in chunks of size step. The only thing you'll need to do for aried is make sure to handle the first distance for every chunk so something like:

for i, start in enumerate(range(0, l, step)):
        if np.isinf(μ_Q[start]):
            out[start] = np.inf
        else:
            D = core.mass(T_A[start : start + m], T_B, M_T, Σ_T)
            if ignore_trivial:
                core.apply_exclusion_zone(D, start, excl_zone, np.inf)
            SL[start], SLI[start] = _get_indices(D)  # I only replaced this single line

These are great questions! Let me know if you have any further questions or thoughts. It's important that I am not coming as "authoritative" and that this is only one person's thinking and so please feel free to push back if you disagree. I am mainly here to provide a broader perspective of the entire code base and to ensure that the code is consistently maintained across it.

there's an easy implementation but it might not be in the style that would fit properly in the library.

If you don't mind iterating, then I don't mind providing the feedback. I'd rather us not overthink "style" in isolation and, instead, let's share it for feedback. It's easier for me to react than to try and read your mind :)

Jan 29 '22 19:01 seanlaw

I'll upload the code so you can see what I mean. With regard to the first question, I wanted to create a full set of empty arrays/lists and just update the indexes, but the total space takes 8 * (n^2). I couldn't even call the constructor function on my laptop because of the size. Right now, it takes 8 * n to create the empty lists. So that's why I used the jagged lists that are appended; Eamonn et. al say most ARIs will only reach 8 * n * (log n) in size.

For the third question, I meant using workers for the four separate minimum-so-far functions that will be called in _ari. They're pranges for the moment, but could be distributed. I figured no because it would take 4x the original workers but was just asking to make sure.

Jan 29 '22 20:01 dylanjprice

Whoops let me split the code, too.

Edit: Am I supposed to be just reuploading the notebook and overwriting the old one, or is there something manual that needs to be done to preserve the old versions?

Jan 29 '22 20:01 dylanjprice

stumpy
stumpy copied to clipboard

[WIP] Create arimp.ipynb

Codecov Report

stumpy stumpy copied to clipboard

[WIP] Create arimp.ipynb

Codecov Report

stumpy
stumpy copied to clipboard