to_pandas(), to_arrow() fail because case_sensitive doesn't work if column in row_filter doesn't match the case even if case_sensitive is set to False in scan
Apache Iceberg version
0.7.1 (latest release)
Please describe the bug 🐞
I have a table with column business_time_hour (all lower case). If I capitalize one character, to_pandas() (and to_arrow()) fails. My code is snippet is:
from pyiceberg import catalog cat = catalog.load_catalog(**{'type':'glue'}) table = cat.load_table('namespace' + '.' + 'table') scan = table.scan(row_filter = "business_time_Hour = '2024-09-15 05:00:00'", case_sensitive = False)
The above line passes
df = scan.to_pandas()
The above line fails with the following message
ValueError: Could not find field with name business_time_Hour, case_sensitive=True
Please note that 1. I set case_sensitive to False in the scan method. 2. character 'H' in business_time_hour is capitalized while the actual column name has all lower case letters.
thanks for reporting this. it might be due to a bug we recently fixed in #1147.
can you try it against the latest main branch?
Do I need to install the different pyiceberg version to confirm this?
On Mon, Sep 16, 2024 at 2:07 PM Kevin Liu @.***> wrote:
thanks for reporting this. it might be due to a bug we recently fixed in #1147 https://github.com/apache/iceberg-python/pull/1147 .
can you try it against the latest main branch?
— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg-python/issues/1177#issuecomment-2353577521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWUYS3HJWYO5B4QFGKXOEKLZW4M6NAVCNFSM6AAAAABOIO636KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGU3TONJSGE . You are receiving this because you authored the thread.Message ID: @.***>
-- Please note that Lukka does not accept cryptographic material such as keys or other sensitive information via e-mail. The content of this e-mail is confidential and intended only for the use of the specified recipient. If you are not the intended recipient, any review, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by reply and immediately delete this message and any attachments. The integrity and security of this e-mail cannot be guaranteed over the internet, please do not reply with confidential information.
You can test out the main branch by downloading the repo and installing the library in edit mode.
Run this command in your iceberg-python repo directory:
pip install -e .
you can switch back to the official version by
pip install pyiceberg --force
Hi Kevin,
Sorry for the late reply.
So do I go to ~/opt/anaconda3/envs/py311/lib/python3.11/site-packages/pyiceberg and type:
~/opt/anaconda3/envs/py311/bin/pip3 install -e . Because it gives me an error
ImportError: cannot import name 'GenericAlias' from partially initialized module 'types' (most likely due to a circular import) (/Users/leonid.makarovsky/opt/anaconda3/envs/py311/lib/python3.11/site-packages/pyiceberg/types.py)
Is there anyway to just pip install a specific version for testing? like pip install pyiceberg==0.7.1fixtemp? Thanks.
--Leonid
On Tue, Sep 17, 2024 at 4:38 PM Kevin Liu @.***> wrote:
You can test out the main branch by downloading the repo and installing the library in edit mode.
Run this command in your iceberg-python repo directory:
pip install -e .
you can switch back to the official version by
pip install pyiceberg --force
— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg-python/issues/1177#issuecomment-2356876742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWUYS3EDDODCDD5LF3OCKIDZXCHL7AVCNFSM6AAAAABOIO636KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJWHA3TMNZUGI . You are receiving this because you authored the thread.Message ID: @.***>
-- Please note that Lukka does not accept cryptographic material such as keys or other sensitive information via e-mail. The content of this e-mail is confidential and intended only for the use of the specified recipient. If you are not the intended recipient, any review, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by reply and immediately delete this message and any attachments. The integrity and security of this e-mail cannot be guaranteed over the internet, please do not reply with confidential information.
Since youre using conda, maybe something like this to install directly from github source
https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github
pip install git+https://github.com/apache/iceberg-python.git
This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.