obspyh5 icon indicating copy to clipboard operation
obspyh5 copied to clipboard

HDF5 write/read support for ObsPy

obspyh5

HDF5 write/read support for obspy

|buildstatus| |coverage| |version| |pyversions| |zenodo|

.. |buildstatus| image:: https://github.com/trichter/obspyh5/workflows/tests/badge.svg :target: https://github.com/trichter/obspyh5/actions

.. |coverage| image:: https://codecov.io/gh/trichter/obspyh5/branch/master/graph/badge.svg :target: https://codecov.io/gh/trichter/obspyh5

.. |version| image:: https://img.shields.io/pypi/v/obspyh5.svg :target: https://pypi.python.org/pypi/obspyh5

.. |pyversions| image:: https://img.shields.io/pypi/pyversions/obspyh5.svg :target: https://python.org

.. |zenodo| image:: https://zenodo.org/badge/DOI/10.5281/zenodo.3953668.svg :target: https://doi.org/10.5281/zenodo.3953668

Writes and reads ObsPy streams to/from hdf5 files. Stats attributes are preserved if they are numbers, strings, UTCDateTime objects or numpy arrays. It can be used as a plugin to obspy's read function to read a whole hdf5 file. Alternatively you can iterate over the traces in a hdf5 file with the iterh5 function.

Installation ^^^^^^^^^^^^ Install h5py and obspy. After that install obspyh5 using pip by::

pip install obspyh5

With conda the package can be installed into a fresh environment with::

conda config --add channels conda-forge
conda create -n obsh5 numpy obspy h5py
conda activate obsh5
pip install obspyh5

Usage ^^^^^ Basic example using the obspy plugin::

>>> from obspy import read
>>> stream = read()  # load example stream
>>> print(stream)
..3 Trace(s) in Stream:
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
>>> stream.write('test.h5', 'H5')  # declare 'H5' as format
>>> print(read('test.h5'))  # order is preserved only for default index
3 Trace(s) in Stream:
BW.RJOB..EHZ | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
BW.RJOB..EHE | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples

Example iterating over traces in a huge hdf5 file. After each iteration the trace is not kept in memory and therefore it is possible to process a huge hdf5 file on a PC without problems. ::

>>> from obspyh5 import iterh5
>>> for trace in iterh5('huge_in.h5')
        trace.do_something()
        trace.write('huge_out.h5', 'H5', mode='a')  # append mode to write into file

Alternative indexing ^^^^^^^^^^^^^^^^^^^^ obspyh5 supports alternative indexing. ::

>>> from obspy import read
>>> import obspyh5
>>> print(obspyh5._INDEX)  # default index
waveforms/{trc_num:03d}_{id}_{starttime.datetime:%Y-%m-%dT%H:%M:%S}_{duration:.1f}s

The index gets populated by the stats object and the trace number when writing a trace, e.g. ::

'waveforms/000_BW.RJOB..EHZ/2009-08-24T00:20:03_30.0s'

To change the index use set_index. ::

>>> obspyh5.set_index('flat')  # flat index wihtout trace number, writing a trace with the same metadata twice will overwrite
>>> obspyh5.set_index('nested')  # nested index
>>> obspyh5.set_index('xcorr')  # xcorr indexing
>>> obspyh5.set_index('waveforms/{network}.{station}/{distance}')  # custom indexing
>>> obspyh5.set_index('waveforms/{trc_num:03d}_{station}')  # use of the trace number
>>> obspyh5.set_index()  # default index

When using the 'xcorr' indexing stats needs the entries 'network1', 'station1', 'location1', 'channel1', 'network2', 'station2', 'location2' and 'channel2' of the first and second station. An example: ::

>>> from obspy import read
>>> import obspyh5
>>> obspyh5.set_index('xcorr')  # activate xcorr indexing
>>> stream = read()
>>> for i, tr in enumerate(stream):  # manipulate stats object
        station1, station2 = 'ST1', 'ST%d' % i
        channel1, channel2 = 'HHZ', 'HHN'
        s = tr.stats
        # we manipulate seed id so that important information gets
        # printed by obspy
        s.network, s.station = s.station1, s.channel1 = station1, channel1
        s.location, s.channel = s.station2, s.channel2 = station2, channel2
        s.network1 = s.network2 = 'BW'
        s.location1 = s.location2 = ''
>>> print(stream)
ST1.HHZ.ST0.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
ST1.HHZ.ST1.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
ST1.HHZ.ST2.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
>>> stream.write('test_xcorr.h5', 'H5')
>>> print(read('test_xcorr.h5'))
ST1.HHZ.ST0.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
ST1.HHZ.ST1.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples
ST1.HHZ.ST2.HHN | 2009-08-24T00:20:03.000000Z - 2009-08-24T00:20:32.990000Z | 100.0 Hz, 3000 samples

Note ^^^^ See also ASDF_ for a more comprehensive approach.

Use case: Cross-correlation of late Okhotsk coda (notebook_).

.. _ASDF: https://seismic-data.org/

.. _notebook: http://nbviewer.jupyter.org/github/trichter/notebooks/blob/master/cross_correlation_okhotsk_coda.ipynb