iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

to_pandas(), to_arrow() fail because case_sensitive doesn't work if column in row_filter doesn't match the case even if case_sensitive is set to False in scan

Open leonidmakarovsky opened this issue 1 year ago • 5 comments

Apache Iceberg version

0.7.1 (latest release)

Please describe the bug 🐞

I have a table with column business_time_hour (all lower case). If I capitalize one character, to_pandas() (and to_arrow()) fails. My code is snippet is:

from pyiceberg import catalog cat = catalog.load_catalog(**{'type':'glue'}) table = cat.load_table('namespace' + '.' + 'table') scan = table.scan(row_filter = "business_time_Hour = '2024-09-15 05:00:00'", case_sensitive = False)

The above line passes

df = scan.to_pandas()

The above line fails with the following message

ValueError: Could not find field with name business_time_Hour, case_sensitive=True

Please note that 1. I set case_sensitive to False in the scan method. 2. character 'H' in business_time_hour is capitalized while the actual column name has all lower case letters.

leonidmakarovsky avatar Sep 16 '24 04:09 leonidmakarovsky

thanks for reporting this. it might be due to a bug we recently fixed in #1147.

can you try it against the latest main branch?

kevinjqliu avatar Sep 16 '24 18:09 kevinjqliu

Do I need to install the different pyiceberg version to confirm this?

On Mon, Sep 16, 2024 at 2:07 PM Kevin Liu @.***> wrote:

thanks for reporting this. it might be due to a bug we recently fixed in #1147 https://github.com/apache/iceberg-python/pull/1147 .

can you try it against the latest main branch?

— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg-python/issues/1177#issuecomment-2353577521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWUYS3HJWYO5B4QFGKXOEKLZW4M6NAVCNFSM6AAAAABOIO636KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJTGU3TONJSGE . You are receiving this because you authored the thread.Message ID: @.***>

-- Please note that Lukka does not accept cryptographic material such as keys or other sensitive information via e-mail. The content of this e-mail is confidential and intended only for the use of the specified recipient. If you are not the intended recipient, any review, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by reply and immediately delete this message and any attachments. The integrity and security of this e-mail cannot be guaranteed over the internet, please do not reply with confidential information.

leonidmakarovsky avatar Sep 17 '24 19:09 leonidmakarovsky

You can test out the main branch by downloading the repo and installing the library in edit mode.

Run this command in your iceberg-python repo directory:

pip install -e .

you can switch back to the official version by

pip install pyiceberg --force

kevinjqliu avatar Sep 17 '24 20:09 kevinjqliu

Hi Kevin,

Sorry for the late reply.

So do I go to ~/opt/anaconda3/envs/py311/lib/python3.11/site-packages/pyiceberg and type:

~/opt/anaconda3/envs/py311/bin/pip3 install -e . Because it gives me an error

ImportError: cannot import name 'GenericAlias' from partially initialized module 'types' (most likely due to a circular import) (/Users/leonid.makarovsky/opt/anaconda3/envs/py311/lib/python3.11/site-packages/pyiceberg/types.py)

Is there anyway to just pip install a specific version for testing? like pip install pyiceberg==0.7.1fixtemp? Thanks.

--Leonid

On Tue, Sep 17, 2024 at 4:38 PM Kevin Liu @.***> wrote:

You can test out the main branch by downloading the repo and installing the library in edit mode.

Run this command in your iceberg-python repo directory:

pip install -e .

you can switch back to the official version by

pip install pyiceberg --force

— Reply to this email directly, view it on GitHub https://github.com/apache/iceberg-python/issues/1177#issuecomment-2356876742, or unsubscribe https://github.com/notifications/unsubscribe-auth/AWUYS3EDDODCDD5LF3OCKIDZXCHL7AVCNFSM6AAAAABOIO636KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNJWHA3TMNZUGI . You are receiving this because you authored the thread.Message ID: @.***>

-- Please note that Lukka does not accept cryptographic material such as keys or other sensitive information via e-mail. The content of this e-mail is confidential and intended only for the use of the specified recipient. If you are not the intended recipient, any review, dissemination, distribution or copying of this e-mail is strictly prohibited. If you have received this e-mail in error, please notify us immediately by reply and immediately delete this message and any attachments. The integrity and security of this e-mail cannot be guaranteed over the internet, please do not reply with confidential information.

leonidmakarovsky avatar Sep 20 '24 14:09 leonidmakarovsky

Since youre using conda, maybe something like this to install directly from github source

https://stackoverflow.com/questions/19042389/conda-installing-upgrading-directly-from-github

pip install git+https://github.com/apache/iceberg-python.git

kevinjqliu avatar Sep 20 '24 16:09 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Mar 20 '25 00:03 github-actions[bot]