Memory spikes x10 if shapes are in a network
Checklist
- [x] I am using the current
masterbranch or the latest release. Please indicate. - [x] I am running on an up-to-date
pypsa-eurenvironment. Update viaconda env update -f envs/environment.yaml.
Describe the Bug
PR https://github.com/PyPSA/pypsa-eur/pull/1013 introduced a new feature that shape files are stored inside network files (in addition to the .geojson-files that are stored in the resources/). This is convenient for plotting; however, for large networks, reading networks requires massive memory spike compared to previous versions w/o n.shapes.
For example, take a workflow for the 50 node electricity-only network with an up-to-date pypsa-eur. Let's pick build_powerplants rule from build_electricity.smk with default 7GB memory allocation: https://github.com/PyPSA/pypsa-eur/blob/885a881e7824f40b109faedfbf88b46dff9f462b/rules/build_electricity.smk#L31-L50
The script build_powerplants.py requires ~10.6GB of memory for the 50 node network, whereas profiling the same script w/o the line that reads base network show everything w/o reading network requires only ~2.2GB of memory. The 7GB memory legacy setting is thus not sufficient anymore, breaking the workflow with the default settings.
What's causing the memory spike?
If profiling the following test script shows that most of 10GB memory spike is needed in PyPSA/pypsa/io.py for the xarray call self.ds = xr.open_dataset(path)
import pypsa
n = pypsa.Network("resources/test-50/networks/base.nc")
Now, if we drop n.shapes, write to nc, and read again -> the same line requires 80x less memory (120 MB):
n.mremove("Shape", n.shapes.index)
n.export_to_netcdf("resources/test-50/networks/base_noshapes.nc")
n = pypsa.Network("resources/test-50/networks/base_noshapes.nc")
What can be done?
-- increase memory requirements within PyPSA-Eur and PyPSA-x (not ideal given the size of spikes)
-- make n.shapes optional in config (trade-off between convenience and sanity)
-- any workaround for xr.open_dataset(..)?
xref #1238