projects icon indicating copy to clipboard operation
projects copied to clipboard

PatoolError when running the ETL example

Open neelasha23 opened this issue 2 years ago • 3 comments

On running this pipeline: https://github.com/ploomber/projects/tree/master/templates/etl , got the following error:

PatoolError: patool can not unpack
patool error: error extracting ../ploomber/templates/etl/output/data.7z: could not find an executable program to extract format 7z; candidates are (7z,7za,7zr),

Fixed it by replacing extractall in preprocess/download.py by :

shutil.register_unpack_format('7zip', ['.7z'], unpack_7zarchive)
shutil.unpack_archive(product['zipped'], product['extracted'])

neelasha23 avatar May 02 '22 09:05 neelasha23

great catch! Want to submit a PR? Make sure we're avoiding adding a new package dependency since the graph is already quite heavy

idomic avatar May 02 '22 13:05 idomic

Sure! I can take a look if there's some other way. The above change required a dependency of py7zr

neelasha23 avatar May 02 '22 14:05 neelasha23

hi, thanks for reporting this! This is an issue with conda, sometimes it fails to find the appropriate package. If you can find a simple way to replace the package for something that's easier to install that'd be great. Alternatively, re-writing the example might be better.

When I wrote the initial example, I made the mistake of using a dataset that has this weird 7z compression. But we could really use any dataset, as long as the example stays the same: download data from the internet, upload it to a db, process it with SQL, and then have some python for visualization.

edublancas avatar May 06 '22 04:05 edublancas