ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

MultiIndex with RangeIndex as first column is not consitent with RangeIndex

Open vasil-pashov opened this issue 9 months ago • 0 comments

Describe the bug

Arctic symbols having range index have some constraints.

  • When appending the start of the new index must be the same as the end of the current data
  • The step of the appended index must match the step of the current data

When MultiIndex these constraints are not applied. Thus one can create a MultiIndex-ed DataFrame whose main index is out of order.

Steps/Code to Reproduce

import pandas as pd
import numpy as np
import arcticdb as adb
dates1 = pd.date_range("01/01/2024", "01/10/2024")
dates2 = pd.date_range("01/15/2024", "01/20/2024")
rowrange1 = pd.RangeIndex(start=0, stop=10)
rowrange2 = pd.RangeIndex(start=15, stop=21)
midx1 = pd.MultiIndex.from_arrays([rowrange1, dates1], names=["datetime", "level"])
midx2 = pd.MultiIndex.from_arrays([rowrange2, dates2], names=["datetime", "level"])

ac = adb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)
lib.write("test", pd.DataFrame({"col": range(0, len(midx1))}, index=midx1))
lib.append("test", pd.DataFrame({"col": range(0, len(midx2))}, index=midx2))
lib.append("test", pd.DataFrame({"col": range(0, len(midx1))}, index=midx1))

print(lib.read("test").data)

Output

                     col
datetime level
0        2024-01-01    0
1        2024-01-02    1
2        2024-01-03    2
3        2024-01-04    3
4        2024-01-05    4
5        2024-01-06    5
6        2024-01-07    6
7        2024-01-08    7
8        2024-01-09    8
9        2024-01-10    9
15       2024-01-15    0
16       2024-01-16    1
17       2024-01-17    2
18       2024-01-18    3
19       2024-01-19    4
20       2024-01-20    5
0        2024-01-01    0
1        2024-01-02    1
2        2024-01-03    2
3        2024-01-04    3
4        2024-01-05    4
5        2024-01-06    5
6        2024-01-07    6
7        2024-01-08    7
8        2024-01-09    8
9        2024-01-10    9

Expected Results

Apply the same constraints for pd.RangeIndex when it's part of a MultiIndex. Throw exception in the above case.

OS, Python Version and ArcticDB Version

Python: 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] OS: Windows-10-10.0.22631-SP0 ArcticDB: dev

Backend storage used

No response

Additional Context

No response

vasil-pashov avatar May 10 '24 17:05 vasil-pashov