ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Allow unordered indexes in staged area when `sort_and_finalize_staged_data` is used

Open vasil-pashov opened this issue 6 months ago • 0 comments

Is your feature request related to a problem? Please describe. Currently sort_and_finalize_staged_data the indexes in all segments to be sorted. Or an exception is thrown.

import pandas as pd
import numpy as np
import arcticdb as adb

ac = adb.Arctic("lmdb://test")
lib = ac.get_library("test", create_if_missing=True)
dates = [np.datetime64('2023-01-03'), np.datetime64('2023-01-01'), np.datetime64('2023-01-05')]
df = pd.DataFrame({"col": [2, 1, 3]}, index=dates)
lib.write("sym", df, staged=True)
lib.sort_and_finalize_staged_data("sym")

Output:

Traceback (most recent call last):
  File "...\test.py", line 9, in <module>
    lib.write("sym", df, staged=True)
  File "...\arcticdb\version_store\library.py", line 461, in write
    return self._nvs.write(
  File "...\arcticdb\version_store\_store.py", line 583, in write
    self.version_store.write_parallel(symbol, item, norm_meta, udm)
arcticdb_ext.exceptions.UnsortedDataException: E_UNSORTED_DATA When writing/appending staged data in parallel, input data must be sorted.

Describe the solution you'd like Allow unordered indexes in staged segments and sort then when sort_and_finalize_staged_data is called.

vasil-pashov avatar Aug 01 '24 09:08 vasil-pashov