strax icon indicating copy to clipboard operation
strax copied to clipboard

Arbitrary chunk splitting

Open jmosbacher opened this issue 3 years ago • 1 comments

This is an implementation of issue #431, adds option to split the data by overlap with the two target chunks instead of full containment. The overlapping data is automatically trimmed on concatenation. This will reduce complexity of chunk alignment for plugins with multiple dependencies and allow for parallel processing of subclasses of OverlapWindowPlugin.

Can you briefly describe how it works?

  • added optional allow_overlap in Chunk.split method which enables the splitting on overlap.
  • added strict_bounds property to chunk to mark whether the chunk bounds (start,end) fully contain all its data.
  • chunk overlaps are trimmed on concatenation. Can you give a minimal working example (or illustrate with a figure)?
import strax
import straxen

st = straxen.contexts.demo()
c = next(st.get_iter( '180423_1021','raw_records',))
idx = len(c.data)//2  # not important but lets split approximately at the center
row = c.data[idx] 
t = row['time'] + row['dt']//2 # select a time that falls within the record interval
try:
    c1,c2 = c.split(t)
except strax.CannotSplit:
    print("Previous splitting logic fails.")

c1,c2 = c.split(t, allow_overlap=True) # after setting allow_overlap to True the split will succeed
assert c1.end == c2.start == t # split is done exactly at requested point in time.
assert c1.data['time'][-1]>c2.data['time'][0] # the two resulting chunks will overlap each other

jmosbacher avatar Aug 28 '21 15:08 jmosbacher