Package not Self-Contained
Thank you for open-sourcing this handy tool! I was trying to install the package from pip and source, but neither works out-of-the-box. From my end (Ubuntu with Python 3.10), running the command metacrafter scan-file --format short <file name> gives me the error:
Traceback (most recent call last): File "~/miniconda3/envs/metacrafter/bin/metacrafter", line 33, in <module> sys.exit(load_entry_point('metacrafter==0.0.4', 'console_scripts', 'metacrafter')()) File "~/miniconda3/envs/metacrafter/lib/python3.10/site-packages/metacrafter-0.0.4-py3.10.egg/metacrafter/__main__.py", line 10, in main from .core import cli File "~/miniconda3/envs/metacrafter/lib/python3.10/site-packages/metacrafter-0.0.4-py3.10.egg/metacrafter/core.py", line 18, in <module> from iterable.helpers.detect import open_iterable ModuleNotFoundError: No module named 'iterable'
Even I installed iterabledata 1.0.5 from pip, I ran into another error: AttributeError: module 'snappy' has no attribute 'decompress'. Could you please look into the issue? Thanks in advance.
@superctj Hi! Looks like I described more dependencies wrong in the package. I will fix it ASAP, thanks!
I think you need to install python-snappy with pip install python-snappy
More info here https://stackoverflow.com/questions/48535799/module-snappy-has-no-attribute-decompress
Fixed in main branch, will be updated in next package release
Thank you @ivbeg for the quick action! I appreciate it.
Hi @ivbeg again, FYI, when I installed the package from the main branch, I ran into ModuleNotFoundError: No module named 'Cython'. After I installed Cython, the installation completed but when running the file scan command, the AttributeError: module 'snappy' has no attribute 'decompress' popped up again. I did pip install python-snappy and it fixed the error. However, I got a parquet.ParquetFormatException: Unsupported encoding: RLE_DICTIONARY when scanning a parquet file. Do you have any idea?
@superctj not yet, it's ok with almost all parquet files that I tested. Could you share this file please?
Thank you for your quick response! GitHub does not support attaching parquet files so I put the sample file in Google Drive. Let me know if you cannot access the file.
@superctj Thanks. I use pure Python parquet lib https://pypi.org/project/parquet/ to read parquet files since it provides simple iteration functions but looks like it doesn't support this type of encoding. I will take a look a bit later if I could easily replace it with pyarrow parquet reader
@superctj Finally fixed, replaced parquet lib with pyarrow. The changes are in the iterabledata library, you need to reinstall it from main branch source code repository https://github.com/apicrafter/pyiterable
Thank you @ivbeg for the quick action! I will probably give it a shot later.