ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Overlapping index check during update logic is incorrect

Open joe-iddon opened this issue 1 year ago • 2 comments

Describe the bug

When investigating an error I found that it would not occur if there was a "gap" between the two indexes during an update(...) call, but would if there is no gap, but not overlapping. This is suspicious since we would expect the same behaviour (no error in both cases). The error I'm investigating will be fixed soon (empty type fix), but this investigation reveals there are some logical errors in how we calculate if timeseries are overlapping.

Steps/Code to Reproduce

import arcticdb as adb
import pandas as pd
ac = adb.Arctic('lmdb:///tmp')
l2 = ac.get_library('test', create_if_missing=1)
l2._nvs._lib_cfg.lib_desc.version.write_options.dynamic_schema=True
l2.write('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([1,2,3])))
l2.update('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([4,5,6])))
# Get the error which should only be the case for overlapping indexes
InternalException: E_ASSERTION_FAILURE Allocate data called with zero size

whereas

l2.write('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([1,2,3])))
l2.update('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([5,6,7])))

does not show the error.

Expected Results

Consistent behaviour between the two cases: either an error in both cases, or in neither, depending on if the other bug has been fixed.

OS, Python Version and ArcticDB Version

Python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] OS: Linux-3.10.0-1160.102.1.el7.x86_64-x86_64-with-glibc2.10 ArcticDB: 4.0.3

Backend storage used

LMDB

Additional Context

No response

joe-iddon avatar Feb 06 '24 17:02 joe-iddon

I can't replicate on arcticdb==4.5.0rc0.


In [1]: import arcticdb as adb
   ...: import pandas as pd
   ...: ac = adb.Arctic('lmdb:///tmp')
   ...: l2 = ac.get_library('test', create_if_missing=1)
   ...: l2._nvs._lib_cfg.lib_desc.version.write_options.dynamic_schema=True
   ...: l2.write('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([1,2,3])))
   ...: l2.update('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([4,5,6])))
[2024-07-01 17:13:03.790] [arcticdb] [info] Column a does not have non null elements.
Out[1]: VersionedItem(symbol='k', library='test', data=n/a, version=3, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850383798649002)

In [2]: l2.read('k')
Out[2]: VersionedItem(symbol='k', library='test', data=<class 'pandas.core.frame.DataFrame'>, version=3, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850383798649002)

In [3]: l2.read('k').data
Out[3]:
                                  a
1970-01-01 00:00:00.000000001  None
1970-01-01 00:00:00.000000002  None
1970-01-01 00:00:00.000000003  None
1970-01-01 00:00:00.000000004
1970-01-01 00:00:00.000000005
1970-01-01 00:00:00.000000006

In [4]: l2.write('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([1,2,3])))
   ...: l2.update('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([5,6,7])))
[2024-07-01 17:15:43.764] [arcticdb] [info] Column a does not have non null elements.
Out[4]: VersionedItem(symbol='k', library='test', data=n/a, version=5, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850543765148898)

In [5]: l2.read('k').data
Out[5]:
                                  a
1970-01-01 00:00:00.000000001
1970-01-01 00:00:00.000000002
1970-01-01 00:00:00.000000003
1970-01-01 00:00:00.000000005  None
1970-01-01 00:00:00.000000006  None
1970-01-01 00:00:00.000000007  None

In [6]:

jamesmunro avatar Jul 01 '24 16:07 jamesmunro

  • [ ] Add reproduction as test

jamesmunro avatar Jul 01 '24 16:07 jamesmunro

Binary search shows this was fixed in #1227, comprehensive nonreg test added in #1978

alexowens90 avatar Nov 04 '24 16:11 alexowens90