ElectricityLCI
ElectricityLCI copied to clipboard
Missing eGRID subregion generation by fuel category reference data
The electricity baseline provides a user-defined configuration value for 'egrid_year', which triggers the data file, '~/electricitylci/data/egrid_subregion_generation_by_fuelcategory_reference_[year].csv' to be accessed in 'egrid_energy.py' (referenced in 'generation_mix.py').
The ElectrictyLCI only provides two CSV files: 2014 and 2016. See electricitylci/data/egrid_subregion_generation_by_fuelcategory_reference_2016.csv.
This means that ELCI_2 configuration model is unsupported. This also means that future baselines are hindered by the lack of this data file.
In order to support the current development and future baselines, a little more transparency is needed regarding the following:
- What is this reference data file?
- Where does it come from?
- How was it created?
https://github.com/USEPA/ElectricityLCI/blob/master/electricitylci/data/egrid_subregion_generation_by_fuelcategory_reference_2016.csv
~~Note that it does not appear that StEWI has the facility generation data from eGRID. Tried the various formats with "getInventory," but failed to find 'Electricity' data.~~
- https://github.com/USEPA/standardizedinventories/blob/master/stewi/init.py#L63
- https://github.com/USEPA/standardizedinventories/blob/master/stewi/formats.py#L10
Found it here:
- https://github.com/USEPA/standardizedinventories/blob/master/stewi/init.py#L137
stewi.getInventory('eGRID', year, stewiformat='flowbyfacility')
will return a dataframe that includes emissions and Electricity output as a flow.
Note also that stewi.getInventoryFacilities('eGRID', year)
includes the fuel type by facility.
My guess is some combination of these generated the files originally but I do not know.
Example code:
import os
import pandas as pd
from stewi import getInventoryFacilities
from stewi import getInventory
def make_egrid_subregion_ref(year):
"""Generate the 'egrid_subregion_generation_inventory_reference' CSV data
file for a given year (if it does not already exist).
Parameters
----------
year : ing
Data year.
"""
# Define the output file, which should be in data directory of package.
ref_name = "egrid_subregion_generation_by_fuelcategory_reference_%s.csv" % year
ref_path = os.path.join(data_dir, ref_name)
if os.path.exists(ref_path):
logging.info(
"eGRID subregion generation inventory %s reference exists" % year)
else:
logging.info(
"Creating eGRID subregion generation inventory "
"%s reference CSV" % year)
# Pull the inventory data from stewi.
a = stewi.getInventory("eGRID", year)
# Pull facility meta data from stewi.
meta_cols = [
'FacilityID',
'eGRID subregion acronym',
'Plant primary coal/oil/gas/ other fossil fuel category'
]
b = stewi.getInventoryFacilities("eGRID", 2018)[meta_cols]
# Merge two data frames together to get inventory + facility metadata.
c = pd.merge(
left=a.query("FlowName == 'Electricity'"),
right=b,
on="FacilityID",
)
# Group by and sum by FacilityID and FuelCategory to get total
# electricity generation. Update column names to match existing
# CSV files in the repo.
c = c.groupby(
by=[
'eGRID subregion acronym',
'Plant primary coal/oil/gas/ other fossil fuel category']
)['FlowAmount'].agg('sum').reset_index()
c = c.rename(columns={
'eGRID subregion acronym': 'Subregion',
'Plant primary coal/oil/gas/ other fossil fuel category': 'FuelCategory',
'FlowAmount': 'Electricity'
})
# Convert Electricity from MJ to MWh; and order
c['Electricity'] /= 3600.0
c = c.sort_values(by=['FuelCategory', 'Subregion'])
c.to_csv(ref_path, index=False)
^^^ The method above will be added to egrid_facilities.py
to create the reference CSV when called in the global space of egrid_energy.py
right before the file is accessed to avoid FileNotFound Error.
NOTE: I found no reference to either "egrid_subregion_totals_reference_2016.csv" or "egrid_subregion_totals_reference_2014.csv" so I omitted their creation.