pypsa-usa icon indicating copy to clipboard operation
pypsa-usa copied to clipboard

`build_powerplants` duckdb issue on Linux

Open trevorb1 opened this issue 7 months ago • 0 comments

Checklist

  • [x] I am using the current master branch
  • [x] I am running on an up-to-date pypsa-usa environment. Update via conda env update -f envs/environment.yaml

The Issue

In build_powerplants, duckdb keeps giving the error below. Its only ever in the get_heat_rates query. Looking at the duckdb GitHub, I believe it is an issue on their side (ie. not an issue with pypsa-usa). I will leave my fix here for the time being, though. Just incase someone else runs into the same issue :)

After trying a bunch of different suggestions, the solution in this comment ended up working for me. Specifically, I added the following lines to the top of the query in get_heat_rates. You will need to tune to your system, though. Also, this caused build_powerplants to take quite a bit longer to run.

SET threads TO 12;
SET memory_limit = '64GB';
SET temp_directory='/mnt/data/tmp';

Steps To Reproduce

I can only reproduce this on a lab computer with a specific version of Linux. When I run via WSL on other computers, this issue does not appear. So it seems to be tied to the specific version of Linux I am running. Again, I think this is a duckdb issue, and not a pypsa-usa issue.

$ hostnamectl
   Static hostname: 
         Icon name: 
           Chassis: desktop
        Machine ID: 
           Boot ID: 
  Operating System: Ubuntu 20.04.6 LTS
            Kernel: Linux 5.15.0-67-generic
      Architecture: x86-64

Expected Behavior

No response

Error Message

rule build_powerplants:
    input: repo_data/WECC_ADS_public, repo_data/WECC_ADS_public/eia_ads_generator_mapping_updated.csv, repo_data/plants/fuelCost22.csv, repo_data/plants/cems_heat_rates.xlsx, repo_data/plants/epa_eia_crosswalk.csv
    output: resources/issue604_elec/powerplants.csv
    log: logs/build_powerplants.log
    jobid: 23
    reason: Missing output files: resources/issue604_elec/powerplants.csv
    resources: tmpdir=/tmp

Traceback (most recent call last):
  File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 984, in <module>
    eia_data_operable, heat_rates = load_pudl_data(
                                    ^^^^^^^^^^^^^^^
  File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 124, in load_pudl_data
    heat_rates = get_heat_rates(start_date, end_date)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 122, in get_heat_rates
    return duckdb.query(query).to_df()
           ^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 262144 bytes

Anything else?

No response

trevorb1 avatar May 27 '25 15:05 trevorb1