obsplus icon indicating copy to clipboard operation
obsplus copied to clipboard

Ambiguous station_count column in events_to_df

Open sboltz opened this issue 4 years ago • 4 comments

  • obsplus version: master
  • Python version: 3.7.4
  • Operating System: Ubuntu 18.04

Description

One of the columns in the output of events_to_df is simply called station_count, which is a deviation from the naming convention for the OriginQuality object in obspy, which has an associated_station_count (number of stations at which the event was observed) and an used_station_count (number of stations from which data was used for origin computation). This is mildly confusing because from the label it is ambiguous as to which of these items the value refers to, or if it refers to something else entirely.

Proposed Change

Looking at the source code, it appears that station_count is intended to be the equivalent of used_station_count. I recommend changing the column name to reflect this. This might break things, but it should be relatively easy to update.

sboltz avatar Jul 26 '20 04:07 sboltz

Sounds like a good idea to me. I wouldn't worry about breaking changes to much as this is a relatively obscure column. Just make sure to add something in the changelog.

d-chambers avatar Jul 26 '20 04:07 d-chambers

Sounds good. I guess another thing, is this going to force everyone's EventBanks to re-index? That's where it might be a slightly bigger annoyance.

On Sat, Jul 25, 2020 at 9:52 PM Derrick [email protected] wrote:

Sounds like a good idea to me. I wouldn't worry about breaking changes to much as this is a relatively obscure column. Just make sure to add something in the changelog.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/niosh-mining/obsplus/issues/195#issuecomment-663935635, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKRQJS74MYTHQH4TFY4B63R5OY7XANCNFSM4PHZMTJQ .

sboltz avatar Jul 26 '20 05:07 sboltz

Ya, good point. It looks like 'station_countis used in the event bank index. At one point I had thought about restricting the index to only include columns necessary to perform the queries allowed byget_events` but other programs (like our processing software) currently need some of the other columns.

Maybe we just leave this issue open until we have to make a more significant breaking change to EventBank then tackle it then?

d-chambers avatar Jul 26 '20 05:07 d-chambers

Yeah, I think that leaving it open would be fine. It's pretty easy to just work around it for what I'm doing right now.

As far as restricting the columns, how much of a pain would it be to specify which columns from events_to_df to include in the index? I know that I personally use the extra columns a lot to look at location quality information without having to go and convert the entire bank to a df, but I can also see where not everyone would want or need that.

On Sat, Jul 25, 2020 at 10:16 PM Derrick [email protected] wrote:

Ya, good point. It looks like 'station_countis used in the event bank index. At one point I had thought about restricting the index to only include columns necessary to perform the queries allowed byget_events` but other programs (like our processing software) currently need some of the other columns.

Maybe we just leave this issue open until we have to make a more significant breaking change to EventBank then tackle it then?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/niosh-mining/obsplus/issues/195#issuecomment-663937027, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADKRQJRHPYSTPXWZER6Y6JTR5O32BANCNFSM4PHZMTJQ .

sboltz avatar Jul 26 '20 05:07 sboltz