obsplus icon indicating copy to clipboard operation
obsplus copied to clipboard

pick_count, station_count, etc. should really be int instead of float

Open shawnboltz opened this issue 11 months ago • 3 comments

Description This is a minor annoyance, but the data type for some of the columns in events_to_df (and possibly some of the other DFExtractors), such as pick_count, station_count, etc., should really be an int instead of a float.

To Reproduce

import obsplus
obsplus.load_dataset("crandall_test").event_client.to_df().dtypes

Expected behavior Here's the output of the above where I've noted which columns should be changed

time                      datetime64[ns]
latitude                         float64
longitude                        float64
depth                            float64
magnitude                        float64
event_description                 object
associated_phase_count           float64  # int64
azimuthal_gap                    float64
event_id                          object
horizontal_uncertainty           float64
local_magnitude                  float64
moment_magnitude                 float64
duration_magnitude               float64
magnitude_type                    object
p_phase_count                    float64  # int64
s_phase_count                    float64  # int64
p_pick_count                     float64  # int64
s_pick_count                     float64  # int64
standard_error                   float64
used_phase_count                 float64  # int64
station_count                    float64  # int64
vertical_uncertainty             float64
updated                   datetime64[ns]
author                            object
agency_id                         object
creation_time             datetime64[ns]
version                           object
stations                          object
dtype: object

Because this would potentially force re-indexing of everyone's event bank's, I recommend waiting to address this until we address #195 and/or other index issues.

Versions (please complete the following information):

  • ObsPlus Version '0.2.6.dev1+g17d7abd'

shawnboltz avatar Mar 01 '24 23:03 shawnboltz

Looking a little more at what would be required to actually change this, obsplus.utils.pd._int_to_time_columns makes this more complicated than simply changing the dtype definition.

shawnboltz avatar Mar 02 '24 00:03 shawnboltz

I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?

d-chambers avatar Mar 02 '24 01:03 d-chambers

I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?

Yeah, that could be why. I know pandas has a nullable int now, but I don't think it's doable with numpy itself.

shawnboltz avatar Mar 04 '24 16:03 shawnboltz