EFS file extraction error, unable to extract zip format
Checklist
- [X] I am using the current
masterbranch - [X] I am running on an up-to-date
pypsa-usaenvironment. Update viaconda env update -f envs/environment.yaml
The Issue
I was using the command“Snakemake-j1-- configfile config/config.default.yaml” when I found that the EFS file could not be decompressed and wanted to know what to do with it
Steps To Reproduce
No response
Expected Behavior
No response
Error Message
[Thu Oct 31 00:37:18 2024]
Error in rule retrieve_nrel_efs_data:
jobid: 22
output: data/nrel_efs/EFSLoadProfile_Reference_Moderate.csv
log: logs/retrieve/retrieve_efs_Reference_Moderate.log (check log file(s) for error details)
RuleException:
CalledProcessError in file C:\Windows\System32\pypsa-usa\workflow\rules/retrieve.smk, line 71:
Command 'D:/Miniconda/envs/pypsa-usa/python.exe "C:\Windows\System32\pypsa-usa\workflow\.snakemake\scripts\tmpvztggb_a.retrieve_databundles.py"' returned non-zero exit status 1.
File "C:\Windows\System32\pypsa-usa\workflow\rules/retrieve.smk", line 71, in __rule_retrieve_nrel_efs_data
File "D:\Miniconda\envs\pypsa-usa\Lib\concurrent\futures\thread.py", line 58, in run
Removing output files of failed job retrieve_nrel_efs_data since they might be corrupted:
data/nrel_efs/EFSLoadProfile_Reference_Moderate.csv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake\log\2024-10-30T231010.703325.snakemake.log
Anything else?
No response
Additional error information: EFSLoadProfile_Reference_Moderate.csv: Unsupported ZIP compression method (9: deflation-64-bit)
Hey @huainie; thanks for reporting! What OS are you using? Windows? If you are on windows, I would suggest running PyPSA-USA via WSL (this is what I do). Im not sure we have any regular developers using Windows, so this may just be a Windows specific bug. Sorry for the hassle!
@samdotson you found a fix for this on windows correct? Could you share what you did?
@trevorb1 Yes, I use windows @ktehranchi Yes, I solved the problem by finding the unzipped part of the code that uses Python's zipfile module to handle all the zipped files instead of relying on the system command tar. The original decompression method could not handle DEFLATE64.
I replaced the system command tar with Python's zipfile module to handle it
I solved the issue by installing 7 Zip and replacing the tar system command with 7z. Worked like a charm. The only challenge is that it depends on users having 7z installed (which could be added to the instructions). Or I could add a snakemake rule to install check for 7z and install it if not found.
For Windows OS, try replace the retrieve_databundles.py with the followings:
"""Script retrieves data from various zenodo repositories specified by the snakemake rule. Used by multiple snakemake rules."""
import logging
import platform
import subprocess
import zipfile
from pathlib import Path
import zipfile_deflate64 # For Windows OS, use zipfile-deflate64
from _helpers import configure_logging, progress_retrieve
logger = logging.getLogger(__name__)
# Note: when adding files to pypsa_usa_data.zip, be sure to zip the folder w/o the root folder included:
# ` cd pypsa_usa_data && zip -r ../pypsa_usa_data.zip . `
def download_repository(url, rootpath, repository):
# Save locations
if repository == "USATestSystem":
subdir = "breakthrough_network/"
elif repository == "EFS":
subdir = "nrel_efs/"
else:
subdir = ""
tarball_fn = Path(f"{rootpath}/{repository}.zip")
to_fn = Path(f"{rootpath}/data/{subdir}")
logger.info(f"Downloading {repository} zenodo repository from '{url}'.")
progress_retrieve(url, tarball_fn)
logger.info(f"Extracting {repository} databundle.")
if repository == "EFS":
if platform.system() == "Windows": # For Windows OS, use zipfile-deflate64
with zipfile_deflate64.ZipFile(tarball_fn, "r") as zip_ref:
zip_ref.extractall(to_fn)
else:
cmd = ["unzip", tarball_fn, "-d", to_fn]
subprocess.run(cmd, check=True)
else:
with zipfile.ZipFile(tarball_fn, "r") as zip_ref:
zip_ref.extractall(to_fn)
logger.info(f"{repository} Databundle available in {to_fn}")
if __name__ == "__main__":
if "snakemake" not in globals():
from _helpers import mock_snakemake
# snakemake = mock_snakemake("retrieve_zenodo_databundles")
# snakemake = mock_snakemake('retrieve_sector_databundle')
snakemake = mock_snakemake(
"retrieve_nrel_efs_data",
efs_case="Reference",
efs_speed="Moderate",
)
rootpath = ".."
else:
rootpath = "."
configure_logging(snakemake)
repositories = snakemake.params[0]
for repository in repositories:
url = repositories[repository]
download_repository(url, rootpath, repository)
#windows