slow station metadata gathering
While downloading data through pysep, the step with the output log message "pysep - INFO: querying IRIS for station metadata" takes an incredible amount of time (~13 minutes on my setup). Looking further at the logs, it is clearly gathering station information for ALL possible stations available through IRIS and then removing the stations not satisfying the selection criteria provided as input. I am indeed using wildcards ("*")for networks, stations, locations, and channels. The only criteria for station selection that I'm using is a radius of 300km from my event location, and hence need information on a very small proportion of the ALL possible stations. Is there a way that pysep can intelligently handle this, or is it dependent on what it's dependencies / data sources prescribe / provide, and this is the best it could do with that?
One option is for the user to exactly know what networks and stations fall in the region and be more specific with the query, but that is not always ideal.
I'm attaching a config.yaml file for reproduction purposes. 2025-08-20T203445.yaml
Unfortunately we need to gather the station information first to be able to select for distance and azimuth, so that wildcard search seems unavoidable, and searching for all wildcards is probably just a slow process in general.
But potentially to speed things up we can avoid getting response level metadata for that first search, but rather just get station level metadata, then do our curtailing, then re-query for response information once we have our shortened list:
Change level here https://github.com/adjtomo/pysep/blob/43af4658945dfa64d487629d9faca3017c96ba6a/pysep/pysep.py#L990-L998
Find a way to feed that back after running curtail_stations
https://github.com/adjtomo/pysep/blob/43af4658945dfa64d487629d9faca3017c96ba6a/pysep/pysep.py#L1840-L1845
I use the following function to gather station information for the specified geographic extent. So the output is already curtailed, and it takes only a second. I guess reducing level to stations and then calling curtail_stations might have a similar performance, but we might consider giving away on the curtailing in pysep and just use what IRIS FDSN web service offers already, unless there are other considerations. It's fast.
def get_stations(latitude_min, latitude_max, longitude_min, longitude_max,
station="*", network="*", location="*", channel="*",
starttime="2000-01-01T00:00:00"):
"""
Get stations from the IRIS FDSN web service within a specified geographic extent
"""
params = {
"minlatitude": latitude_min,
"maxlatitude": latitude_max,
"minlongitude": longitude_min,
"maxlongitude": longitude_max,
"station": station,
"network": network,
"location": location,
"channel": channel,
"starttime": starttime,
"format": "text"
}
url = "https://service.iris.edu/fdsnws/station/1/query"
response = requests.get(url, params=params)
text_stream = io.StringIO(response.text)
next(text_stream) # Skip the header line
stations = []
networks = []
latitudes = []
longitudes = []
elevations = []
for line in text_stream:
parts = line.strip().split('|')
stations.append(parts[1])
networks.append(parts[0])
latitudes.append(float(parts[2]))
longitudes.append(float(parts[3]))
elevations.append(float(parts[4]))
return stations, networks, latitudes, longitudes, elevations