reading hourly model data in aerocom format and colocating based on that crashes
Describe the bug Please provide a clear and concise description of what the bug is.
- Pyaerocom version: 0.22.dev0; branch
1330-add-more-aeroval-base-configurations - Computing platform:
- Configuration file:
from pyaerocom.aeroval import EvalSetup, ExperimentProcessor
from pyaerocom.aeroval.config.cameo.base_config import get_CFG
CFG = get_CFG(anayear=2019,
)
CFG["raise_exceptions"] = False
CFG["add_model_maps"] = False
stp = EvalSetup(**CFG)
ana = ExperimentProcessor(stp)
ana.update_interface()
res = ana.run()
- Error message
/lustre/storeB/project/fou/kl/CAMS2_40/task4041/EMEP.cameo/renamed/aerocom3_EMEP.cameo_concnh4_Surface_2019_hourly.nc.
Error: repr(Last timestamp of data 2019-12-31T00:00:00.000000 does not lie in end period: 2019-12-31 23:00)
Invalid var_name time for coord None in cube. Overwriting with time
Invalid long_name None for coord time in cube. Overwriting with Time
Invalid long_name latitude for coord lat in cube. Overwriting with Center coordinates for latitudes
Invalid long_name longitude for coord lon in cube. Overwriting with Center coordinates for longitudes
Failed to perform analysis: Traceback (most recent call last):
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocator.py", line 390, in run
coldata = self._run_helper(
^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocator.py", line 1068, in _run_helper
coldata = self._colocation_func(**args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocation_utils.py", line 799, in colocate_gridded_ungridded
all_stats = data_ref.to_station_data_all(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/ungriddeddata.py", line 1257, in to_station_data_all
data = self.to_station_data(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/ungriddeddata.py", line 955, in to_station_data
merged = merge_station_data(
^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/helpers.py", line 931, in merge_station_data
merged = _merge_stats_2d(
^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/helpers.py", line 803, in _merge_stats_2d
merged.merge_other(
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 868, in merge_other
self.merge_vardata(other, var_name, **kwargs)
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 838, in merge_vardata
return self._merge_vardata_2d(other, var_name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 756, in _merge_vardata_2d
s0 = pd.concat([s0, s1], verify_integrity=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
return op.get_result()
^^^^^^^^^^^^^^^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 644, in get_result
new_index = self.new_axes[0]
^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 702, in new_axes
return [
^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 703, in <listcomp>
self._get_concat_axis if i == self.bm_axis else self._get_comb_axis(i)
^^^^^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 766, in _get_concat_axis
self._maybe_check_integrity(concat_axis)
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 774, in _maybe_check_integrity
raise ValueError(f"Indexes have overlapping values: {overlap}")
ValueError: Indexes have overlapping values: DatetimeIndex(['2019-12-26'], dtype='datetime64[s]', freq=None)
Failed to perform analysis: Traceback (most recent call last):
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocator.py", line 390, in run
coldata = self._run_helper(
^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocator.py", line 1068, in _run_helper
coldata = self._colocation_func(**args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/colocation/colocation_utils.py", line 799, in colocate_gridded_ungridded
all_stats = data_ref.to_station_data_all(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/ungriddeddata.py", line 1257, in to_station_data_all
data = self.to_station_data(
^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/ungriddeddata.py", line 955, in to_station_data
merged = merge_station_data(
^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/helpers.py", line 931, in merge_station_data
merged = _merge_stats_2d(
^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/helpers.py", line 803, in _merge_stats_2d
merged.merge_other(
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 868, in merge_other
self.merge_vardata(other, var_name, **kwargs)
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 838, in merge_vardata
return self._merge_vardata_2d(other, var_name, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jang/data/Python3/pyaerocom/pyaerocom/stationdata.py", line 756, in _merge_vardata_2d
s0 = pd.concat([s0, s1], verify_integrity=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 395, in concat
return op.get_result()
^^^^^^^^^^^^^^^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 644, in get_result
new_index = self.new_axes[0]
^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 702, in new_axes
return [
^
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 703, in <listcomp>
self._get_concat_axis if i == self.bm_axis else self._get_comb_axis(i)
^^^^^^^^^^^^^^^^^^^^^
File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 766, in _get_concat_axis
self._maybe_check_integrity(concat_axis)
File "/modules/rhel8/user-apps/aerocom/conda2022/envs/pya-edit/lib/python3.11/site-packages/pandas/core/reshape/concat.py", line 774, in _maybe_check_integrity
raise ValueError(f"Indexes have overlapping values: {overlap}")
ValueError: Indexes have overlapping values: DatetimeIndex(['2019-12-26'], dtype='datetime64[s]', freq=None)
To Reproduce
Steps to reproduce the behavior:
1.git switch 1330-add-more-aeroval-base-configurations
2.put config file above into a python file
3.python <your created Python file>
Expected behavior don't crash
Screenshots None
Additional context anaysis of hourly data for the CAMEO project
Check the times again for precision. More generally speaking, we should probably reconsider how we collocated with hourly data, whether we want to actually use timestamps from the model, how, we deal with what the timestamps represent (beginning, middle, end), etc.
I had a look at the times once again. time variable of old task4041 file:
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time at middle of period" ;
time:units = "days since 1900-01-01" ;
time:calendar = "standard" ;
time:axis = "T" ;
data:
time = "2018-01-01 00:30", "2018-01-01 01:30", "2018-01-01 02:30",
New file:
double time(time) ;
time:standard_name = "time" ;
time:long_name = "time at end of period" ;
time:units = "days since 1900-01-01" ;
time:calendar = "standard" ;
time:axis = "T" ;
data:
time = "2019-01-01", "2019-01-01 01", "2019-01-01 02", "2019-01-01 03",
So the difference is that the new data uses the end point (time at end of period) while the old data uses the middle point (time at middle of period)
I'm not aware that I ever saw the end point explicitly mentioned in the time's long name. Testing if that might be the problem.
closing this for now since the EMEP data in aerocom format is not used anymore (in CAMEO at least)