EQcorrscan
EQcorrscan copied to clipboard
custom process-function options
Is your feature request related to a problem? Please describe.
Gaps in seismic data cause most of the issues with normalisation of correlations. EQcorrscan's pre_processing
functions take care of gaps pretty well now, but it would be good to expose users to how gaps are handled so that they can easily write their own custom process functions (e.g. adding processing steps, or using a different type of filtering, decimating rather than resampling...) that also handle gaps in the way the correlation functions expect.
It would also be useful if the match_filter
objects allowed a custom process-function to be specified (in a similar way that users can specify any correlation function they want). This would allow more people to use those objects.
Describe the solution you'd like
- Refactor gap handling as a context manager;
- Provide a new keyword arg for match-filter objects of
process_func
.
Both would require docs and tutorials to make it clear how they should be used - in general the docs are in real need of a tidy.
-
pre_processing._fill_gaps
andpre_processing._zero_pad_gaps
would be repurposed as__enter__
and__exit__
functions on aHandleGaps
context manager. The API would end up looking something like this:
from eqcorrscan.pre_processing import HandleGaps
with HandleGaps(tr):
custom_processing(tr)
- Would be fairly simple, add an extra argument, and check when calling the processing functions if it had been set, otherwise, use the inbuilt processing functions.
Hey @calum-chamberlain,
This looks interesting. I love the idea of simply doing the thing most people will want by default, but allowing users to modify the default behaviour when needed. A few thoughts/questions:
- Its probably better to have the
__enter__
method return a trace/stream rather than assuming it will operate in place. This will allow the logic of the pre-processor to operate in place or not, and then just return the resulting object. So the API could look something like this:
from eqcorrscan.pre_processing import HandleGaps
with HandleGaps(tr) as trr:
custom_processing(trr)
-
What would the clean-up of the context manager do? The main strength of the context manager is ensuring the
__exit__
method gets called regardless of unhandled exceptions. Is there something you had in mind that needs to happen once the HandleGaps scope exists or could a function call suffice to save a level of indentation? -
Users may want to have several pre-processing methods in a particular order. It may be useful to provide a way to chain them together. Something similar to scikit learn's Pipeline maybe?
Thanks for those @d-chambers (also thanks for the book recommendation, I'm chewing my way through it, and almost every page has something of great interest!).
At the moment, the function _fill_gaps
is run before filtering and resampling, and _zero_pad_gaps
is used after processing to fill the gaps found by _fill_gaps
with zeros. I was thinking that _fill_gaps
would be the equivalent of an __enter__
and _zero_pad_gaps
would be used as __exit__
. What these functions do is:
-
_fill_gaps
finds gaps in data and interpolates over them to enforce a continuous trace for processing, it returns the trace and the gap positions; -
_zero_pad_gaps
takes the processed trace and the gap positions and replaces the values in the gaps positions with zeros to ensure that correlations are zero in the window where there was originally no data. Do you think that would work? I was planning on it working in-place...
- That pipline idea looks interesting, not sure how I would implement it, but could be something fun in the future.
No problem, that book is incredible, I learned a ton from it. There are still parts of it, especially the async stuff, that I am struggling to wrap my head around.
Ok so _zero_pad_gaps
actually acts on the resulting correlogram correct? Ya, that makes sense to me.
Ah, no, _zero_pad_gaps
just works on the trace data... this would just encompass a pre-processing (filter and resample, not correlate) process... Does that make sense? The flow is something like:
- Read in data that has some gaps into a
Stream
with multiple segments; - Call
_fill_gaps
to make the data continuous; - Filter, resample and anything else;
- Call
_zero_pad_gaps
to cut out data from the gap positions determined by_fill_gaps
, and replace with zeros. - Call
match-filter
, the correlation function returns zeros when there are fewer than two non-zero samples in the correlation window.
I was imagining having step 1 as __enter__
and step 4 as __exit__
. It's not easy to edit the correlogram because the stacked correlogram is returned for memory efficiency.
So the context manager is specifically to enforce _fill_gaps
being called first and _zero_pad_gaps
being called last in the preprocessing correct? I can see why that would be useful and it does seem like a good fit for a context manager to me.
Yup, that's it - I'm hoping that it would be simple for people to use as well - its the only bit of the processing functions that I would say is really required for the correlation functions. Everything else could/should be personal preference.
Playing around more with this, I don't think the context manager fits and I'm just going to expose the (previously "private") gap handling functions.
Working on adding custom processing functions here