`build_powerplants` duckdb issue on Linux
Checklist
- [x] I am using the current
masterbranch - [x] I am running on an up-to-date
pypsa-usaenvironment. Update viaconda env update -f envs/environment.yaml
The Issue
In build_powerplants, duckdb keeps giving the error below. Its only ever in the get_heat_rates query. Looking at the duckdb GitHub, I believe it is an issue on their side (ie. not an issue with pypsa-usa). I will leave my fix here for the time being, though. Just incase someone else runs into the same issue :)
After trying a bunch of different suggestions, the solution in this comment ended up working for me. Specifically, I added the following lines to the top of the query in get_heat_rates. You will need to tune to your system, though. Also, this caused build_powerplants to take quite a bit longer to run.
SET threads TO 12;
SET memory_limit = '64GB';
SET temp_directory='/mnt/data/tmp';
Steps To Reproduce
I can only reproduce this on a lab computer with a specific version of Linux. When I run via WSL on other computers, this issue does not appear. So it seems to be tied to the specific version of Linux I am running. Again, I think this is a duckdb issue, and not a pypsa-usa issue.
$ hostnamectl
Static hostname:
Icon name:
Chassis: desktop
Machine ID:
Boot ID:
Operating System: Ubuntu 20.04.6 LTS
Kernel: Linux 5.15.0-67-generic
Architecture: x86-64
Expected Behavior
No response
Error Message
rule build_powerplants:
input: repo_data/WECC_ADS_public, repo_data/WECC_ADS_public/eia_ads_generator_mapping_updated.csv, repo_data/plants/fuelCost22.csv, repo_data/plants/cems_heat_rates.xlsx, repo_data/plants/epa_eia_crosswalk.csv
output: resources/issue604_elec/powerplants.csv
log: logs/build_powerplants.log
jobid: 23
reason: Missing output files: resources/issue604_elec/powerplants.csv
resources: tmpdir=/tmp
Traceback (most recent call last):
File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 984, in <module>
eia_data_operable, heat_rates = load_pudl_data(
^^^^^^^^^^^^^^^
File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 124, in load_pudl_data
heat_rates = get_heat_rates(start_date, end_date)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local-scratch/localhome/tmb8/repos/pypsa-usa/workflow/.snakemake/scripts/tmphe_l5771.build_powerplants.py", line 122, in get_heat_rates
return duckdb.query(query).to_df()
^^^^^^^^^^^^^^^^^^^
duckdb.duckdb.OutOfMemoryException: Out of Memory Error: Failed to allocate block of 262144 bytes
Anything else?
No response