pandas
pandas copied to clipboard
BUG: union of lists of DatetimeIndex generates ValueErrors
Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
a = pd.date_range(start=pd.Timestamp.today() - pd.Timedelta(days=30), end=pd.Timestamp.today())
b = [a[:5], a[5:10], a[5:7], a[10:]]
b[0].union(b[1:])
# and
#b[0].union(b[1:2])
Issue Description
The first example fails with ValueError: Lengths must match to compare, when performing a monotonic data check. (pandas/core/indexes/base.py", line 6672, in _cmp_method raise ValueError("Lengths must match to compare") The second example fails with ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), when performing a monotonic data check (File "pandas/_libs/algos.pyx", line 754, in pandas._libs.algos.is_monotonic).
union_many()
works fine but generates a soon to be deprecated warning
Expected Behavior
import pandas as pd a = pd.date_range(start=pd.Timestamp.today() - pd.Timedelta(days=30), end=pd.Timestamp.today()) b = [a[:5], a[5:10], a[5:7], a[10:]] b[0].union_many(b[1:])
Installed Versions
INSTALLED VERSIONS
commit : 06d230151e6f18fdb8139d09abf539867a8cd481 python : 3.10.4.final.0 python-bits : 64 OS : Linux OS-release : 5.4.0-122-generic Version : #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_GB.UTF-8 LOCALE : en_GB.UTF-8
pandas : 1.4.1 numpy : 1.22.3 pytz : 2022.1 dateutil : 2.8.2 pip : 22.0.4 setuptools : 60.10.0 Cython : None pytest : None hypothesis : None sphinx : 4.4.0 blosc : None feather : None xlsxwriter : None lxml.etree : 4.8.0 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.0 IPython : 7.32.0 pandas_datareader: 0.10.0 bs4 : 4.10.0 bottleneck : None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.5.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.8.0 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None xlwt : None zstandard : None
I won't say this is a bug, as union
was supposed to work with DatetimeIndex only, rather than list[DatetimeIndex]. For that purpose, union_many
is supposed to be used, which will be deprecated as you observed. Maybe use functools.reduce
?
reduce(lambda x,y: x.union(y), b)
Is the plan to remove the functionality of union_many
when it is deprecated or is union
going to be expanded to incorporate the functionality?
union_many
is a wrapper around union
, so there is no performance impact when removing this.
Closing. Please ping, if there is anything we are missing
I don't understand the response. Either union_many
should be maintained or union
should be adapted to provide the same usage as union_many
. Otherwise you are reducing the usability of the package. To be clear:
If you are deprecating union_many
you should not advise users to use obj.union
in the deprecation warning as this does not offer the same functionality
You can try append.
We are a voluntary project. You are welcome to contribute if you think some warnings are not clear