QuakeMigrate icon indicating copy to clipboard operation
QuakeMigrate copied to clipboard

:atom: Rewrite onset functions as C library

Open hemmelig opened this issue 2 years ago • 3 comments

What is the purpose of this Pull Request?

Rewrite the current onset functions (one in Python, one imported from ObsPy) as a C library that is compiled and wrapped by QuakeMigrate.

Time profiling confirmed using our own C functions was faster than the previous function (though, in fairness, they were already pretty quick!) by a factor of ~3.

Documentation for all the C functions has been updated and the names of variables have been reviewed to improve clarity.

A simple test has been added that validates the new functions using a simple artificial dataset where the expected results have been computed independently.

Relevant Issues

Replaces existing Pull Request #95.

Test cases

Added a new unit test that compares the outputs of the new functions with hand-computed values. There are some very minor differences in the final outputs of the overlapping STA/LTA function (figure below), but these are down on the order of 10E-13 and do not cause any failures in the benchmark testing.

Screenshot 2023-10-29 at 11 03 52 PM
  • [x] All examples run without any new warnings
  • [x] test_benchmarks.py reports all example tests pass

System details

macOS Monterey 12.5.1, Python 3.8 and Python 3.11. The workflow tests (Linux) all pass, too.

Final checklist

  • [x] Correct base branch selected?
    • master (if a bugfix/patch update)
    • version/vX.X.X if a new significant feature (discuss with developers)
  • [x] All tests still pass.
  • [x] Any new features or fixed regressions are covered by new tests.
  • [x] Any new or changed features are fully documented.

hemmelig avatar Oct 30 '23 03:10 hemmelig

In addition to the changes already completed, we will also try to merge in some additional changes that have been made to the onset computation. These changes concern the signal transformation that is applied to the waveform data prior to computing the onset function. Right now, the default is to compute the onset function using the square of the waveform data—that is, the energy, rather than the amplitude of the waveforms. This would remain the default, but some additional options are:

  • The absolute value of the waveform amplitudes
  • The envelope of the waveform amplitudes (a.k.a. the Hilbert transform of the waveforms)
  • The envelope squared

The question is where exactly we handle this. The implementation in this PR has the signal transform handled in the C functions. We could retain this (might be more efficient?) by using an enum as a flag in the C functions (and corresponding Python implementations, that we have decided to retain).

A couple of other minor changes include:

  • Introducing a minimum onset value, which is equivalent to setting a minimum SNR filter for which observations to include (default to a very low value i.e., no filter)
  • Pad the onset functions with 1s, not 0s, as a default null result

There may be some other minor additions, the above is based on a branch for which they may be some additional updates

@TomWinder does that seem about the long and short of it?

hemmelig avatar Mar 25 '24 21:03 hemmelig

Further to the changes outlined above, a further change/fix to the onset function computation has been identified:

  • Move the application of log to the wrapper function for C migrate function. This makes the compute_onsets function perform fewer 'hidden' or unnecessary tasks—it now only does what it 'says on the tin'. It also fixes a bug wherein the log was being taken before the onset functions from multiple components were being combined.

The changes outlined above will cause changes to the results, meaning the benchmarks will have to be updated. While making these changes, it makes sense to also update the Iceland icequake example to use a more appropriate set of input parameters.

hemmelig avatar Mar 27 '24 15:03 hemmelig

@TomWinder just bumping this for review.

hemmelig avatar May 23 '24 16:05 hemmelig

@TomWinder bumping this for review (shouldn't take too long).

hemmelig avatar Aug 08 '24 19:08 hemmelig

FYI started here comparing to v1.1 / my trigger_locate_dec20 branch, which had a lot of these changes implemented already (moving log, different onset options, padding with 1's etc etc). (from this: master...TomWinder:QuakeMigrate:trigger_locate_dec20)

General things:

  • did we(/you) not also have a tweak to detect/trigger time period for the icequake_iceland example (to focus in on the single real event, and reduce unecessary runtime -- maybe also a change to the timestep)?
  • notebook updates (also mentioned in an in-line comment)
  • currently not possible to actually use the pure-python sta/lta functions (if I'm not mistaken?) Maybe a backend=python kwarg (or similar, can't remember how obspy dealt with this).

Will now pull locally and have a play with it. But all looks great on big-picture level.

Yes, I believe I did at one point but exact changes likely lost to mist of time (some cloned version somewhere, maybe stashed, etc etc...). I think the change, for now, does a good job but can look into it.

Can add a kwarg as suggested, I think I just assumed someone would only want to use them in a standalone notebook and could just import them.

hemmelig avatar Aug 21 '24 18:08 hemmelig

Pulled, tested, read through. Tests all pass.

System: Ubuntu 22.04.4 LTS Python: 3.11.9 (AMD EPYC CPU)

Minor bits:

  • icequake_iceland pick plotting took 270 seconds on a fast server.. did we used to have this switched off by default? (I think so..)
  • icequake_iceland marginal window should be much shorter. I played a bit and 0.1 s seems pretty good to me.
  • icequake_rutford detect compute is very heavy with this config. Does it make enough difference to run detect at decimate([2, 2, 2]) to be worth it not to decimate? Seems likely result is to turn people away...
  • icequake_rutford likewise, 190 s for pick plots. Don't remember it being quite this slow previously but it's been a while.
  • VT_iceland: should use "remove_full_response = False" within the response_params
  • VT_iceland event summary figure even is being suuper slow to plot (~ 20 s) here; maybe just something weird with the system. Fine to ignore if it seems ok on your end.

Re. final tweaks and getting this merged, I will have ~ zero internet til Sunday, so I'll approve here then do as needed to get it merged in (review with QM or whatever) if needed before next week.

The pick plotting is taking how long?! It takes 3 seconds on my laptop for the Iceland icequake example, and 2.85 seconds for the Rutford one... We should do some profiling to see what's causing this huge difference.

Ahh, yes—that's what I previously changed and it pulls out 2 additional events, it's coming back to me now! I'll have a fiddle.

For the Rutford one, you're right. I can instead have it default to perform some decimation, and leave a comment saying 'if your system allows, you can increase the resolution to explore the difference it makes'. I think the value here is that it really nicely highlights how you get this spiking peak, rather than a smooth Gaussian, which is caused by the insufficiently fine spatial resolution for how narrow the onset function peaks are (something the boxcar smoothing would address, in the long run?).

hemmelig avatar Aug 26 '24 17:08 hemmelig

In fact, going even shorter (0.032 seconds, or 8 samples at 250 Hz) seems to really do the trick for the Iceland icequake example.

Reducing the Rutford example marginal window to 0.028 s also does magic. Now I need to go away and think about what the trade-offs are—we can't just keep reducing the marginal window, of course, but it should be guided by the width of the coalescence peak, right? Perhaps we could seed an estimated marginal window at runtime, but let QM do some fitting (similar process as we used for picking)? Just a thought, for another time.

hemmelig avatar Aug 26 '24 17:08 hemmelig

In the attempt to tie these values to something concrete, I've adjusted them to be approximately the full duration at half maximum.

hemmelig avatar Aug 27 '24 18:08 hemmelig

The above failures are my fault—I was being sloppy with environments. I'll resolve on my end and fix.

hemmelig avatar Aug 29 '24 13:08 hemmelig

Please could you update on this? (i.e. - should it be fine for a final check and review?)

TomWinder avatar Sep 02 '24 16:09 TomWinder

Actually during testing something came up re NumPy and Python versions. NumPy 2.0 does not support Python 3.8, presumably because Python 3.8 will no longer be recommended for use come October. Our tests pass (i.e., we do not need to change anything to make use of NumPy 2.0), but it is probably worth confirming that there is no change in our benchmark results between the last 1.X.X NumPy and 2.0+. I propose we drop support for Python 3.8 along with this PR (and add support for Python 3.12), then we can get this merged. Thoughts?

hemmelig avatar Sep 02 '24 16:09 hemmelig

Yes happy with dropping 3.8 and adding 3.12. Lmk if I can do anything useful on the NumPy version testing.

For reference, obspy have been discussing dropping 3.8 since October 2022, and NumPy actually dropped 3.9 in April: NEP

TomWinder avatar Sep 02 '24 17:09 TomWinder

Ok, updated everything. Let me know if I've covered it all and I'll proceed with the planned build on Test PyPI as discussed.

hemmelig avatar Sep 03 '24 15:09 hemmelig

Looks great. Will just pull and test locally for a final time.

TomWinder avatar Sep 03 '24 15:09 TomWinder

Ok—once the tests run and pass, everything is ready to merge on my end. Once it's merged down to master, I will tag the master branch and create a release. Once the release is approved, everything should build and be available through PyPI!

hemmelig avatar Sep 05 '24 14:09 hemmelig

Checking through (on Python 3.11 atm). Do you also see this FutureWarning? (in Icequake_Iceland/iceland_trigger.py)

        Triggering events...
/home/tebw2/Software/.miniconda3/envs/py311/lib/python3.11/site-packages/quakemigrate/signal/trigger.py:461: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  triggers = pd.concat(

pandas version 2.2.2 (from PyPI)

TomWinder avatar Sep 05 '24 14:09 TomWinder

Also still currently suffering from ridiculous plotting times, will have a small dig and also ensure it's not happening anywhere else.

TomWinder avatar Sep 05 '24 14:09 TomWinder

Also still currently suffering from ridiculous plotting times, will have a small dig and also ensure it's not happening anywhere else.

It's definitely a backend issue - with a DISPLAY output, it tool 18.2 seconds for Iceland_Icequake trigger summary; with DISPLAY="" (i.e. Agg) it reduced to 0.86 seconds .........

TomWinder avatar Sep 05 '24 15:09 TomWinder

Added a final commit that addresses the future warning. Initial attempts were unsuccessful the other day, but figured it out pretty quick this time.

hemmelig avatar Sep 05 '24 15:09 hemmelig