pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: Pyarrow dependency warning starts with newline which makes it impossible to filter out by message with -W or PYTHONWARNINGS

Open lesteve opened this issue 7 months ago • 15 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Issue Description

I would expect that the Pyarrow warning is easily ignorable by message i.e. something like the following does not produce any warning:

python -W 'ignore:Pyarrow:DeprecationWarning' -c 'import pandas'

The issue is that because the warning starts with a newline it can not be targetted by message. This seems to be the case of any warning that starts with some kind of whitespace character.

I tried a variety of things like -W 'ignore:\nPyarrow:DeprecationWarning' but I could not make it work. I think the reason is the "ignoring any whitespace at the start or end of message" from the warnings documentation:

In -W and PYTHONWARNINGS, message is a literal string that the start of the warning message must contain (case-insensitively), ignoring any whitespace at the start or end of message.

In scikit-learn tests warnings are turned into errors with pytest -W and we would like to use -W too to ignore the Pyarrow dependency warning for now. A possible work-around is to use filterwarnings which accept a regex, see Note in https://docs.pytest.org/en/stable/how-to/capture-warnings.html#controlling-warnings for more details, but being able to use -W or PYTHONWARNINGS seems desirable e.g. to control warnings from the command-line outside of pytest.

Expected Behavior

There is an easy way with -W or PYTHONWARNINGS to ignore the Pyarrow DeprecationWarning e.g.

python -W 'ignore:Pyarrow:DeprecationWarning' -c 'import pandas'

Installed Versions

INSTALLED VERSIONS ------------------ commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.11.4.final.0 python-bits : 64 OS : Linux OS-release : 6.7.0-arch3-1 Version : #1 SMP PREEMPT_DYNAMIC Sat, 13 Jan 2024 14:37:14 +0000 machine : x86_64 processor : byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8

pandas : 2.2.0 numpy : 1.26.3 pytz : 2023.3 dateutil : 2.8.2 setuptools : 68.0.0 pip : 23.2 Cython : 0.29.35 pytest : 7.4.0 hypothesis : None sphinx : 6.0.0 blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.18.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.8.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : 0.21.0 tzdata : 2023.3 qtpy : None pyqt5 : None

lesteve avatar Jan 26 '24 06:01 lesteve

This works: (?s).*Pyarrow will become a required dependency of pandas:DeprecationWarning"

jorenham avatar Jan 26 '24 09:01 jorenham

python -W or PYTHONWARNINGS does not handle regexes, see above link to the Python documentation, so no this does not work:

❯ python -W 'ignore:(?s).*Pyarrow:DeprecationWarning' -c 'import pandas' 
<string>:1: DeprecationWarning: 
Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466

As the doc says, regexes are accepted in warning.filterwarnings and your regex likely works with warning.filterwarnings

lesteve avatar Jan 26 '24 09:01 lesteve

Ah I didn't know that. I assumed it would, since it did work within my pytest config.

jorenham avatar Jan 26 '24 09:01 jorenham

Yeah filterwarnings accepts regex contrary to -W quite a quirk which I did not know about either before very recently. This is mentioned in the pytest doc: https://docs.pytest.org/en/stable/how-to/capture-warnings.html#controlling-warnings

lesteve avatar Jan 26 '24 09:01 lesteve

Thanks for the report, a linked PR is already up to resolve this.

rhshadrach avatar Jan 26 '24 10:01 rhshadrach

Nice, thanks! This is https://github.com/pandas-dev/pandas/pull/57003 indeed. I looked for an open PR but didn't find it somehow ...

lesteve avatar Jan 26 '24 11:01 lesteve

Just to be sure, I stated "already up" as more of a celebration and not an indication this shouldn't have been reported. I would only expect searching the issue tracker and not PRs.

rhshadrach avatar Jan 26 '24 11:01 rhshadrach

Yep that's the way I understood it! I am quite happy someone had a similar issue as me and took the time to open a PR in pandas with some additional explanations.

To me, it seems quite a weird Python quirk that it is not possible to filter out a message starting with a whitespace with -W.

lesteve avatar Jan 26 '24 13:01 lesteve

@lithomas1 - I've marked this as a blocker for 2.2.1. Either we should remove the newline, or the entire deprecation, but either way I think this issue needs to be resolved for 2.2.1.

rhshadrach avatar Feb 07 '24 21:02 rhshadrach

Yep, this is on my radar - was planning to bring it up at the meeting next week.

lithomas1 avatar Feb 07 '24 22:02 lithomas1

Preemptively +100 to remove the deprecation warning in 2.2.1

mroeschke avatar Feb 07 '24 22:02 mroeschke

Could you also remove the entire github link in the message or at least the https:? The semi colon is a problem because the warnings package only split by semi colon. So it's actually impossible to use PYTHONWARNINGS anyway. https://github.com/python/cpython/blob/5914a211ef5542edd1f792c2684e373a42647b04/Lib/warnings.py#L221

for example, you can see the problem here: PYTHONWARNINGS=ignore:'message\nhttps:notacategory' python Invalid -W option ignored: unknown warning category: 'notacategory'

It's easier to change this message than going and ask cpython to fix it I believe. I will still make a bug report on cpython.

mimizone avatar Feb 09 '24 03:02 mimizone

@mimizone you probably know this already but you don't have to use the full message, it only needs to match at the beginning. Since "https://" is towards the end of the message, I don't think it matters.

For example if there was not the newline at the start you could ignore the message like this:

PYTHONWARNINGS='ignore:Pyarrow:DeprecationWarning'

or

PYTHONWARNINGS='ignore:Pyarrow will become a required dependency:DeprecationWarning'

lesteve avatar Feb 09 '24 05:02 lesteve

thanks @lesteve for clarifying this point

mimizone avatar Feb 09 '24 16:02 mimizone

In the meantime, in case somebody is interested in filtering this in the context of Jupyter. I did it using the ipython config, which works well in my context. https://discourse.jupyter.org/t/hide-deprecationwarning-via-the-server-config/23864/5?u=jhuylebroeck

mimizone avatar Feb 17 '24 03:02 mimizone

The warning ended up being removed in https://github.com/pandas-dev/pandas/pull/57556 so closing. The warning will not appear in 2.2.1

mroeschke avatar Feb 22 '24 18:02 mroeschke