Fast Avro Decoder not included in Conda Deployment of pyiceberg
Feature Request / Improvement
when you install pyiceberg via conda you get warnings like: /home/coder/.conda/envs/coder/lib/python3.10/site-packages/pyiceberg/avro/decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation warnings.warn("Falling back to pure Python Avro decoder, missing Cython implementation")
Which means it is missing the fast avro decoder. It would be great to have this functionality to speed up our queries!
Im not sure how condo deals with Cython extensions
but here's the relevant code https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/pyiceberg/avro/decoder.py#L185
https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/build-module.py#L44-L52
I experience the same issue with installation via pypi (Windows):
D:\code\...\.venv\Lib\site-packages\pyiceberg\avro\decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation
https://github.com/apache/iceberg-python/blob/9b9ed534b2022cb9a687f4ed876fadcc2457b31b/pyiceberg/avro/decoder.py#L177-L187
Adding some context to the Avro decoder build process.
We use Poetry to build the Avro decoder via this script https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583
https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py
you can manually trigger by
poetry build
Depending on the platform, there might be some missing piece that's not allowing the build to succeed.
@JanKrl can you try the above command and paste the output here for debugging?
Adding some context to the Avro decoder build process.
We use Poetry to build the Avro decoder via this script
https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583
https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py
you can manually trigger by
poetry buildDepending on the platform, there might be some missing piece that's not allowing the build to succeed.
@JanKrl can you try the above command and paste the output here for debugging?
Deleting python environment (env) solved the problem for me. So, (un)fortunately I'm not able to reproduce it anymore.
Great to hear. did you clean the env manually or use make clean?
Great to hear. did you clean the env manually or use
make clean?
I use venv, so I removed .venv directory and created it again.
I must have missed the "Falling back to pure Python Avro decoder, missing Cython implementation" warning during installation, but I was indeed missing Cython, which was causing a different decoder_fast related error during make test:
~/iceberg-python$ make test
poetry run pytest tests/ -m "(unmarked or parametrize) and not integration"
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /mnt/c/Users/.../iceberg-python
configfile: pyproject.toml
plugins: mock-3.14.0, checkdocs-2.10.1, requests-mock-1.12.1, lazy-fixture-0.6.3
collected 3588 items / 2 errors / 865 deselected / 2723 selected
================================================================== ERRORS ==================================================================
_______________________________________________ ERROR collecting tests/avro/test_decoder.py ________________________________________________
ImportError while importing test module '/mnt/c/Users/.../iceberg-python/tests/avro/test_decoder.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/avro/test_decoder.py:29: in <module>
from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
E ModuleNotFoundError: No module named 'pyiceberg.avro.decoder_fast'
________________________________________________ ERROR collecting tests/avro/test_reader.py ________________________________________________
ImportError while importing test module '/mnt/c/Users/.../iceberg-python/tests/avro/test_reader.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
tests/avro/test_reader.py:24: in <module>
from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
E ModuleNotFoundError: No module named 'pyiceberg.avro.decoder_fast'
========================================================= short test summary info ==========================================================
ERROR tests/avro/test_decoder.py
ERROR tests/avro/test_reader.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================================== 865 deselected, 2 errors in 23.91s ====================================================
make: *** [Makefile:42: test] Error 2
I was able to fix the test errors by following the Cython installation instructions documented here and then rebuilding.
~/iceberg-python$ sudo apt-get install build-essential python3-dev
~/iceberg-python$ pip install --user Cython
~/iceberg-python$ poetry build
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.