iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Fast Avro Decoder not included in Conda Deployment of pyiceberg

Open aschreiber1 opened this issue 1 year ago • 7 comments

Feature Request / Improvement

when you install pyiceberg via conda you get warnings like: /home/coder/.conda/envs/coder/lib/python3.10/site-packages/pyiceberg/avro/decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation warnings.warn("Falling back to pure Python Avro decoder, missing Cython implementation")

Which means it is missing the fast avro decoder. It would be great to have this functionality to speed up our queries!

aschreiber1 avatar Aug 22 '24 18:08 aschreiber1

Im not sure how condo deals with Cython extensions

but here's the relevant code https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/pyiceberg/avro/decoder.py#L185

https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/build-module.py#L44-L52

kevinjqliu avatar Aug 31 '24 13:08 kevinjqliu

I experience the same issue with installation via pypi (Windows): D:\code\...\.venv\Lib\site-packages\pyiceberg\avro\decoder.py:185: UserWarning: Falling back to pure Python Avro decoder, missing Cython implementation

https://github.com/apache/iceberg-python/blob/9b9ed534b2022cb9a687f4ed876fadcc2457b31b/pyiceberg/avro/decoder.py#L177-L187

JanKrl avatar Sep 10 '24 10:09 JanKrl

Adding some context to the Avro decoder build process.

We use Poetry to build the Avro decoder via this script https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py

you can manually trigger by

poetry build

Depending on the platform, there might be some missing piece that's not allowing the build to succeed.

@JanKrl can you try the above command and paste the output here for debugging?

kevinjqliu avatar Sep 10 '24 16:09 kevinjqliu

Adding some context to the Avro decoder build process.

We use Poetry to build the Avro decoder via this script

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/pyproject.toml#L583

https://github.com/apache/iceberg-python/blob/b5933756b5b488ec51cd56d5984731b6cc347f2b/build-module.py

you can manually trigger by

poetry build

Depending on the platform, there might be some missing piece that's not allowing the build to succeed.

@JanKrl can you try the above command and paste the output here for debugging?

Deleting python environment (env) solved the problem for me. So, (un)fortunately I'm not able to reproduce it anymore.

JanKrl avatar Sep 11 '24 14:09 JanKrl

Great to hear. did you clean the env manually or use make clean?

kevinjqliu avatar Sep 11 '24 16:09 kevinjqliu

Great to hear. did you clean the env manually or use make clean?

I use venv, so I removed .venv directory and created it again.

JanKrl avatar Sep 11 '24 16:09 JanKrl

I must have missed the "Falling back to pure Python Avro decoder, missing Cython implementation" warning during installation, but I was indeed missing Cython, which was causing a different decoder_fast related error during make test:

~/iceberg-python$ make test
poetry run pytest tests/ -m "(unmarked or parametrize) and not integration"
=========================================================== test session starts ============================================================
platform linux -- Python 3.10.12, pytest-7.4.4, pluggy-1.5.0
rootdir: /mnt/c/Users/.../iceberg-python
configfile: pyproject.toml
plugins: mock-3.14.0, checkdocs-2.10.1, requests-mock-1.12.1, lazy-fixture-0.6.3
collected 3588 items / 2 errors / 865 deselected / 2723 selected

================================================================== ERRORS ==================================================================
_______________________________________________ ERROR collecting tests/avro/test_decoder.py ________________________________________________
ImportError while importing test module '/mnt/c/Users/.../iceberg-python/tests/avro/test_decoder.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/avro/test_decoder.py:29: in <module>
    from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
E   ModuleNotFoundError: No module named 'pyiceberg.avro.decoder_fast'
________________________________________________ ERROR collecting tests/avro/test_reader.py ________________________________________________
ImportError while importing test module '/mnt/c/Users/.../iceberg-python/tests/avro/test_reader.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
/usr/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
tests/avro/test_reader.py:24: in <module>
    from pyiceberg.avro.decoder_fast import CythonBinaryDecoder
E   ModuleNotFoundError: No module named 'pyiceberg.avro.decoder_fast'
========================================================= short test summary info ==========================================================
ERROR tests/avro/test_decoder.py
ERROR tests/avro/test_reader.py
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================================================== 865 deselected, 2 errors in 23.91s ====================================================
make: *** [Makefile:42: test] Error 2

I was able to fix the test errors by following the Cython installation instructions documented here and then rebuilding.

~/iceberg-python$ sudo apt-get install build-essential python3-dev
~/iceberg-python$ pip install --user Cython
~/iceberg-python$ poetry build

fcrimins avatar Sep 23 '24 21:09 fcrimins

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Mar 23 '25 00:03 github-actions[bot]