pyam icon indicating copy to clipboard operation
pyam copied to clipboard

Concatenating and/or appending between subannual and yearly data

Open phackstock opened this issue 3 years ago • 3 comments

Problem description

When trying to combing two or more IamDataFrames using pyam.concat or IamDataFrame.append() we get an error if we combine frames with and without a subannual column. Here's a minimum example to reproduce the error:

from pyam import IamDataFrame, IAMC_IDX, concat
import pandas as pd

iam_frame = IamDataFrame(pd.DataFrame(
    [["model_a", "scen_a", "World", "Primary Energy", "EJ/yr", 1, 6.0]],
    columns=IAMC_IDX + [2005, 2010],
))
iam_frame_subannual = IamDataFrame(pd.DataFrame(
    [["model_a", "scen_a", "World", "Primary Energy", "EJ/yr", "summer", 1, 6.0]],
    columns=IAMC_IDX + ["subannual", 2005, 2010],
))
# Three options to get ValueError: Incompatible timeseries data index dimensions
concat([iam_frame, iam_frame_subannual])
iam_frame.append(iam_frame_subannual)
iam_frame_subannual.append(iam_frame)

Proposed solution

Before append or concat perform their respective tasks all IamDataFrames involved are checked if they have subannual columns. There are two outcomes of this check:

  1. All or none of the data frames have a "subannual" column. In this case there's no further action reqiured.
  2. Some data frames have a "subannual" column while other do not. In this case we add a new "subannual" column with the value "year" for the missing ones and then go ahead with concatenating or appending.

phackstock avatar Feb 09 '22 13:02 phackstock

Note that I implemented a similar solution in #598 for appending/merging IamDataFrame instances with both yearly data (as integer) and continuous-time resolution (as datetime). I also did a bit of refactoring and restructuring the test suite to have concat and append behave in a similar manner.

danielhuppmann avatar Feb 11 '22 09:02 danielhuppmann

Ah very good. I'll have a look to take some inspiration from that.

phackstock avatar Feb 14 '22 09:02 phackstock

@EmiFej and I also came across the error. The error text is misleading - "incompatible timeseries dimensions" - the error is thrown whenever there are extra columns in dataframes being appended.

willu47 avatar Apr 07 '22 09:04 willu47