pandas icon indicating copy to clipboard operation
pandas copied to clipboard

BUG: union of lists of DatetimeIndex generates ValueErrors

Open btodac opened this issue 2 years ago • 2 comments

Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
a = pd.date_range(start=pd.Timestamp.today() - pd.Timedelta(days=30), end=pd.Timestamp.today())
b = [a[:5], a[5:10], a[5:7], a[10:]]
b[0].union(b[1:])
# and
#b[0].union(b[1:2])

Issue Description

The first example fails with ValueError: Lengths must match to compare, when performing a monotonic data check. (pandas/core/indexes/base.py", line 6672, in _cmp_method raise ValueError("Lengths must match to compare") The second example fails with ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), when performing a monotonic data check (File "pandas/_libs/algos.pyx", line 754, in pandas._libs.algos.is_monotonic).

union_many() works fine but generates a soon to be deprecated warning

Expected Behavior

import pandas as pd a = pd.date_range(start=pd.Timestamp.today() - pd.Timedelta(days=30), end=pd.Timestamp.today()) b = [a[:5], a[5:10], a[5:7], a[10:]] b[0].union_many(b[1:])

Installed Versions

INSTALLED VERSIONS

commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-122-generic Version : #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8

pandas : 1.4.1 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 22.0.4 setuptools : 60.10.0 Cython : None pytest : None hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.0 IPython : 7.32.0 pandas_datareader: 0.10.0 bs4 : 4.10.0 bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None

btodac avatar Aug 03 '22 09:08 btodac

I won't say this is a bug, as union was supposed to work with DatetimeIndex only, rather than list[DatetimeIndex]. For that purpose, union_many is supposed to be used, which will be deprecated as you observed. Maybe use functools.reduce? reduce(lambda x,y: x.union(y), b)

pratyushsharan avatar Aug 09 '22 04:08 pratyushsharan

Is the plan to remove the functionality of union_many when it is deprecated or is union going to be expanded to incorporate the functionality?

btodac avatar Aug 09 '22 09:08 btodac

union_many is a wrapper around union, so there is no performance impact when removing this.

phofl avatar Aug 15 '22 16:08 phofl

Closing. Please ping, if there is anything we are missing

phofl avatar Aug 19 '22 15:08 phofl

I don't understand the response. Either union_many should be maintained or union should be adapted to provide the same usage as union_many. Otherwise you are reducing the usability of the package. To be clear: If you are deprecating union_many you should not advise users to use obj.union in the deprecation warning as this does not offer the same functionality

btodac avatar Aug 31 '22 19:08 btodac

You can try append.

We are a voluntary project. You are welcome to contribute if you think some warnings are not clear

phofl avatar Aug 31 '22 20:08 phofl