pysep icon indicating copy to clipboard operation
pysep copied to clipboard

slow station metadata gathering

Open aakash10gupta opened this issue 4 months ago • 2 comments

While downloading data through pysep, the step with the output log message "pysep - INFO: querying IRIS for station metadata" takes an incredible amount of time (~13 minutes on my setup). Looking further at the logs, it is clearly gathering station information for ALL possible stations available through IRIS and then removing the stations not satisfying the selection criteria provided as input. I am indeed using wildcards ("*")for networks, stations, locations, and channels. The only criteria for station selection that I'm using is a radius of 300km from my event location, and hence need information on a very small proportion of the ALL possible stations. Is there a way that pysep can intelligently handle this, or is it dependent on what it's dependencies / data sources prescribe / provide, and this is the best it could do with that?

One option is for the user to exactly know what networks and stations fall in the region and be more specific with the query, but that is not always ideal.

I'm attaching a config.yaml file for reproduction purposes. 2025-08-20T203445.yaml

aakash10gupta avatar Aug 21 '25 03:08 aakash10gupta

Unfortunately we need to gather the station information first to be able to select for distance and azimuth, so that wildcard search seems unavoidable, and searching for all wildcards is probably just a slow process in general.

But potentially to speed things up we can avoid getting response level metadata for that first search, but rather just get station level metadata, then do our curtailing, then re-query for response information once we have our shortened list:

Change level here https://github.com/adjtomo/pysep/blob/43af4658945dfa64d487629d9faca3017c96ba6a/pysep/pysep.py#L990-L998

Find a way to feed that back after running curtail_stations https://github.com/adjtomo/pysep/blob/43af4658945dfa64d487629d9faca3017c96ba6a/pysep/pysep.py#L1840-L1845

bch0w avatar Aug 21 '25 18:08 bch0w

I use the following function to gather station information for the specified geographic extent. So the output is already curtailed, and it takes only a second. I guess reducing level to stations and then calling curtail_stations might have a similar performance, but we might consider giving away on the curtailing in pysep and just use what IRIS FDSN web service offers already, unless there are other considerations. It's fast.

def get_stations(latitude_min, latitude_max, longitude_min, longitude_max,
                 station="*", network="*", location="*", channel="*",
                 starttime="2000-01-01T00:00:00"):
    """
    Get stations from the IRIS FDSN web service within a specified geographic extent
    """
    params = {
        "minlatitude": latitude_min,
        "maxlatitude": latitude_max,
        "minlongitude": longitude_min,
        "maxlongitude": longitude_max,
        "station": station,
        "network": network,
        "location": location,
        "channel": channel,
        "starttime": starttime,
        "format": "text"
    }

    url = "https://service.iris.edu/fdsnws/station/1/query"
    response = requests.get(url, params=params)
    text_stream = io.StringIO(response.text)
    next(text_stream) # Skip the header line

    stations = []
    networks = []
    latitudes = []
    longitudes = []
    elevations = []

    for line in text_stream:
        parts = line.strip().split('|')
        stations.append(parts[1])
        networks.append(parts[0])
        latitudes.append(float(parts[2]))
        longitudes.append(float(parts[3]))
        elevations.append(float(parts[4]))

    return stations, networks, latitudes, longitudes, elevations

aakash10gupta avatar Aug 26 '25 00:08 aakash10gupta