pudl icon indicating copy to clipboard operation
pudl copied to clipboard

make the allocated net generation work at non-annual frequencies

Open cmgosnell opened this issue 2 years ago • 5 comments

The problem

The allocation process employs three tables: the generation fuel table, the original generation table and the generators table (accessed via pudl_out.gf_eia923(), pudl_out.gen_original_eia923() and pudl_out.gens_eia860() respectively). both the gf and the gen table is originally reported monthly while the gens table is annual. The allocation process doesn’t know to make the gens table monthly before trying to allocate. Given that, I think ensuring the allocation process knows how to deal with monthly data would be a relatively straightforward fix.

Seeing the problem

# instantiate a monthly pudl_out object
pudl_out_ms = pudl.output.pudltabl.PudlTabl(
    pudl_engine,
    freq='MS',
    fill_net_gen=True 
)
gen = pudl_out_ms.gen_eia923()
# the two interim ouptus
gen_original = pudl_out_ms.gen_original_eia923()
gen_allocated = pudl_out_ms.gen_allocated_eia923()

# make the annual version for comparison
pudl_out_as = pudl.output.pudltabl.PudlTabl(
    pudl_engine,
    freq='AS',
    fill_fuel_cost=True,
    roll_fuel_cost=True,
    fill_net_gen=True 
)
gen_as = pudl_out_as.gen_eia923()

# make some plots
gen_options = {
    "Monthly Allocated": gen_allocated,
    "Monthly Original": gen_original,
    "Annual Allocated": gen_as
}
for gen_type, df in gen_options.items():
    gen_sum = df.groupby(['report_date'])[['net_generation_mwh']].sum().sort_index()
    plt.plot(
        gen_sum.index, gen_sum.net_generation_mwh, '.--',
        label=gen_type
    )
plt.legend()
plt.title("Net Generation")
plt.show()

image.png

The solution??

I thiiiiink this could all be solved by using pudl.helpers.clean_merge_asof in pudl.analysis.allocate_net_gen.associate_generator_tables

cmgosnell avatar Feb 17 '22 17:02 cmgosnell

@zaneselvans would I need to employ some new homebrew solution for this because the higher frequency dataframe (the monthly generation table) is the less complete table? clean_merge_asof required the left df to be the higher frequency df and also effectively employs a left merge.

cmgosnell avatar Feb 17 '22 17:02 cmgosnell

Remember that we hated clean_merge_asof because it is slow AF and has some weird edge case behavior. I thought we wanted to do something simpler that just decomposes dates into year, month, and day columns, and then merges treating them like independent entities? Which I think would work in this case as well.

zaneselvans avatar Feb 17 '22 19:02 zaneselvans

Oh definitely this has never been a long-term solution. Maybe it was @TrentonBush who suggested a clean version that decomposed the dates?

cmgosnell avatar Feb 17 '22 19:02 cmgosnell

That rings a bell but 10 minutes of digging only surfaced this old issue #1106 that touched on generating timeseries on a cartesian product of ID fields

TrentonBush avatar Feb 17 '22 20:02 TrentonBush

@cmgosnell is this issue closed by #1608 from @grgmiller?

zaneselvans avatar Sep 12 '22 21:09 zaneselvans