ElectricityLCI
ElectricityLCI copied to clipboard
EIA coalpublic2021.xls Excel file format cannot be determined
The URL used to pull the 2021 coal public Microsoft Excel workbook (http://www.eia.gov/coal/data/public/xls/coalpublic2021.xls) has a file format that does not match its extension. Examining the file, it appears to be an XML spreadsheet (perhaps mis-clicked in the "save as"?).
If you force load the file into Excel, it will open. Enabling editing, I fixed this by re-saving the XML spreadsheet as xls in the expected f7a_2021 folder. I am hopeful that this problem may go away; however, the vintage of the file format is a little concerning (pre-2003).
I see no need in writing a new handler for the pandas read_excel method found in generate_upstream_coal_map
method in coal_upstream.py that throws the error (pasted below for posterity).
Model ELCI_2021 selected.
2024-02-22 15:55:44.450:INFO:model_config:_load_model_specs:Loading model specs
2024-02-22 15:55:44.458:INFO:model_config:check_model_specs:Checking model specs
2024-02-22 15:55:44.460:INFO:model_config:check_model_specs:Checks passed!
2024-02-22 15:55:44.462:INFO:model_config:build_model_class:Model Specs for ELCI_2021
2024-02-22 15:55:44.463:INFO:<string>:run_generation:get upstream process
2024-02-22 15:55:44.463:INFO:__init__:get_upstream_process_df:Generating upstream inventories...
2024-02-22 15:55:44.464:INFO:coal_upstream:read_eia923_fuel_receipts:Loading data from previously downloaded excel file
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[2], line 1
----> 1 exec(open("electricitylci/main.py").read())
File <string>:247
File <string>:97, in main()
File <string>:177, in run_generation()
File ~\ElectricityLCI\electricitylci\__init__.py:453, in get_upstream_process_df(eia_gen_year)
450 import electricitylci.combinator as combine
452 logging.info("Generating upstream inventories...")
--> 453 coal_df = coal.generate_upstream_coal(eia_gen_year)
454 ng_df = ng.generate_upstream_ng(eia_gen_year)
455 petro_df = petro.generate_petroleum_upstream(eia_gen_year)
File ~\ElectricityLCI\electricitylci\coal_upstream.py:541, in generate_upstream_coal(year)
517 """
518 Generate the annual coal mining and transportation emissions (in kg) for
519 each plant in EIA923.
(...)
538 minimerge.drop(
539 """
540 # Read the coal input from eia
--> 541 coal_input_eia = generate_upstream_coal_map(year)
542 # Read coal transportation and mining data
543 coal_transportation = pd.read_csv(
544 os.path.join(data_dir, '2016_Coal_Trans_By_Plant_ABB_Data.csv')
545 )
File ~\ElectricityLCI\electricitylci\coal_upstream.py:284, in generate_upstream_coal_map(year)
279 else:
280 eia7a_path = find_file_in_folder(
281 folder_path=expected_7a_folder,
282 file_pattern_match=['coalpublic'],
283 return_name=False)
--> 284 eia7a_df = pd.read_excel(
285 eia7a_path,
286 sheet_name='Hist_Coal_Prod',
287 skiprows=3
288 )
289 eia7a_df = _clean_columns(eia7a_df)
290 coal_criteria = eia_fuel_receipts_df['fuel_group']=='Coal'
ValueError: Excel file format cannot be determined, you must specify an engine manually.
! The same error occurs with 2022 and the same manual fix was used to correct it!