obsplus
obsplus copied to clipboard
pick_count, station_count, etc. should really be int instead of float
Description
This is a minor annoyance, but the data type for some of the columns in events_to_df
(and possibly some of the other DFExtractors), such as pick_count, station_count, etc., should really be an int instead of a float.
To Reproduce
import obsplus
obsplus.load_dataset("crandall_test").event_client.to_df().dtypes
Expected behavior Here's the output of the above where I've noted which columns should be changed
time datetime64[ns]
latitude float64
longitude float64
depth float64
magnitude float64
event_description object
associated_phase_count float64 # int64
azimuthal_gap float64
event_id object
horizontal_uncertainty float64
local_magnitude float64
moment_magnitude float64
duration_magnitude float64
magnitude_type object
p_phase_count float64 # int64
s_phase_count float64 # int64
p_pick_count float64 # int64
s_pick_count float64 # int64
standard_error float64
used_phase_count float64 # int64
station_count float64 # int64
vertical_uncertainty float64
updated datetime64[ns]
author object
agency_id object
creation_time datetime64[ns]
version object
stations object
dtype: object
Because this would potentially force re-indexing of everyone's event bank's, I recommend waiting to address this until we address #195 and/or other index issues.
Versions (please complete the following information):
- ObsPlus Version '0.2.6.dev1+g17d7abd'
Looking a little more at what would be required to actually change this, obsplus.utils.pd._int_to_time_columns
makes this more complicated than simply changing the dtype definition.
I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?
I am trying to remember exactly why, but I seem to recall there was a good reason these couldn't be ints. Perhaps its because there isn't a good way to represent NaN with default numpy ints?
Yeah, that could be why. I know pandas has a nullable int now, but I don't think it's doable with numpy itself.