If pre-processing fails all data downloaded is deleted
Describe the bug/feature/issue I am using the mass downloader feature within Pysep. Data and metadata in miniseed is saved in a temporary file. However, when some pre-processing fails:
[2025-07-21 00:51:10] - pysep - DEBUG: /Users/felix/Documents/POSTDOC/INVESTIGACION/Manuscripts/FAST_PAPER_EARTHQUAKE_SAMMAN_IRAN_2025/OUT_OF_COUNTRY_DATA/FELIX/GET_DATA_PYSEP/YAML_FILES/2025-06-20T174913_NORTHERN_AND_CENTRAL_IRAN/inv.xml
[2025-07-21 00:52:21] - pysep - INFO: cleaning up channel naming
[2025-07-21 00:54:54] - pysep - WARNING: BW.KW1..BHN can't write SAC headers: list index out of range
All data is deleted. Is there a way to keep the data already fetched, even if is not pre-processed?
To Reproduce
pysep -c 2025-06-20T174914_IRAN_remove_ir_massive_no_mtuq.yaml
See yaml file in Data section
Expected behavior If there is a bug in the pre-processing leave the files downloaded with the metadata and not to delete everything. Also, I would like to know simply how to use pysep for download data, without pre-processing.
Versions 0.4.1
Data
event_tag: 2025-06-20T174913_IRAN
config_file: null
client: IRIS
client_debug: false
timeout: 600
taup_model: ak135
use_mass_download: true
event_selection: default
origin_time: '2025-06-20T17:49:13.0000000Z'
seconds_before_event: 120
seconds_after_event: 3600
event_latitude: 35.44
event_longitude: 53.05
event_depth_km: 10.0
event_magnitude: 5.0
networks: '*'
stations: '*'
channels: "HH?,BH?,LH?"
locations: '*'
reference_time: '2025-06-20T17:49:13.0000000Z'
seconds_before_ref: 120
seconds_after_ref: 3600
phase_list:
- ttall
mindistance_km: 0.0
maxdistance_km: 5000.0
minazimuth: 0
maxazimuth: 360
minlatitude: null
maxlatitude: null
minlongitude: null
maxlongitude: null
remove_response: false
output_unit: VEL
water_level: 0
pre_filt: default
scale_factor: 1
resample_freq: 50
remove_clipped: false
remove_insufficient_length: false
remove_masked_data: true
fill_data_gaps: 0.0
gap_fraction: 0.5
log_level: DEBUG
Thanks for any help you can provide.
Hi @SeismoFelix, if you add sac_raw to your write_files parameter, PySEP will write out the files downloaded from the datacenter before doing further processing, which means you will be able to retain the data even if the rest of the workflow fails. Will that do what you want?
One caveat is that at the moment there is no way to use those files to restart after a bug fix, i.e., you will have to re-download from the data center if you want the processing to be applied correctly. Or manually feed in st in an interactive Python environment
Just fyi this happens here: https://github.com/adjtomo/pysep/blob/devel/pysep/pysep.py#L1858
write_files:
- inv
- event
- stream
- sac
- config_file
- stations_list
- sac_raw
Thanks @bch0w,
What currently happens is that in Pysep's massive download feature, I see a lot of MiniSEED and XML files created in a temporary directory. Then, when the pre-processing starts, if something fails, everything vanishes, including the temporary MiniSEED files. So, I am unsure whether simply writing SAC files from MiniSEED (which requires extracting information from the XML for the SAC headers) will cause an error in some stations and, consequently, delete everything. What I would expect is, at the very least, to preserve the MiniSEED and XML files in case any pre-processing fails.
I think by handling the parameters you mentioned, I can do that, so thank you. At this moment, given how uncertain the data repositories are in the massive download, I am just trying to use Pysep as if it were the ObsPy massive downloader.
I acknowledge this underutilizes Pysep, and you might wonder why I don't simply use the ObsPy massive downloader. However, I feel more comfortable using Pysep and setting it up just for getting the MiniSEED and XML files in this specific case.
Thanks again!
Hi @SeismoFelix, all those temporary files are being generated by ObsPy's mass downloader directly, and then removed once the mass downloader finishes, after which PySEP resumes control with the downloaded files stored in memory as a Stream and Inventory object.
If you use the sac_raw option, then the Stream and Inventory are written out immediately after data downloading (see these lines for the order of operation), no other actions are taken in between. They just get written as SAC files with barebones SAC headers. With this option the only way you would lose that data is if the mass downloader itself failed, but then that would be an ObsPy issue.
And that's nice to hear, I think using PySEP as a mass downloader wrapper is totally within the scope of the code! Hope this works for you.