ArcticDB
ArcticDB copied to clipboard
Overlapping index check during update logic is incorrect
Describe the bug
When investigating an error I found that it would not occur if there was a "gap" between the two indexes during an update(...) call, but would if there is no gap, but not overlapping. This is suspicious since we would expect the same behaviour (no error in both cases). The error I'm investigating will be fixed soon (empty type fix), but this investigation reveals there are some logical errors in how we calculate if timeseries are overlapping.
Steps/Code to Reproduce
import arcticdb as adb
import pandas as pd
ac = adb.Arctic('lmdb:///tmp')
l2 = ac.get_library('test', create_if_missing=1)
l2._nvs._lib_cfg.lib_desc.version.write_options.dynamic_schema=True
l2.write('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([1,2,3])))
l2.update('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([4,5,6])))
# Get the error which should only be the case for overlapping indexes
InternalException: E_ASSERTION_FAILURE Allocate data called with zero size
whereas
l2.write('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([1,2,3])))
l2.update('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([5,6,7])))
does not show the error.
Expected Results
Consistent behaviour between the two cases: either an error in both cases, or in neither, depending on if the other bug has been fixed.
OS, Python Version and ArcticDB Version
Python: 3.8.12 | packaged by conda-forge | (default, Jan 30 2022, 23:42:07) [GCC 9.4.0] OS: Linux-3.10.0-1160.102.1.el7.x86_64-x86_64-with-glibc2.10 ArcticDB: 4.0.3
Backend storage used
LMDB
Additional Context
No response
I can't replicate on arcticdb==4.5.0rc0.
In [1]: import arcticdb as adb
...: import pandas as pd
...: ac = adb.Arctic('lmdb:///tmp')
...: l2 = ac.get_library('test', create_if_missing=1)
...: l2._nvs._lib_cfg.lib_desc.version.write_options.dynamic_schema=True
...: l2.write('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([1,2,3])))
...: l2.update('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([4,5,6])))
[2024-07-01 17:13:03.790] [arcticdb] [info] Column a does not have non null elements.
Out[1]: VersionedItem(symbol='k', library='test', data=n/a, version=3, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850383798649002)
In [2]: l2.read('k')
Out[2]: VersionedItem(symbol='k', library='test', data=<class 'pandas.core.frame.DataFrame'>, version=3, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850383798649002)
In [3]: l2.read('k').data
Out[3]:
a
1970-01-01 00:00:00.000000001 None
1970-01-01 00:00:00.000000002 None
1970-01-01 00:00:00.000000003 None
1970-01-01 00:00:00.000000004
1970-01-01 00:00:00.000000005
1970-01-01 00:00:00.000000006
In [4]: l2.write('k', pd.DataFrame(['', '', ''], columns=['a'], index=pd.to_datetime([1,2,3])))
...: l2.update('k', pd.DataFrame([None, None, None], columns=['a'], index=pd.to_datetime([5,6,7])))
[2024-07-01 17:15:43.764] [arcticdb] [info] Column a does not have non null elements.
Out[4]: VersionedItem(symbol='k', library='test', data=n/a, version=5, metadata=None, host='LMDB(path=/tmp)', timestamp=1719850543765148898)
In [5]: l2.read('k').data
Out[5]:
a
1970-01-01 00:00:00.000000001
1970-01-01 00:00:00.000000002
1970-01-01 00:00:00.000000003
1970-01-01 00:00:00.000000005 None
1970-01-01 00:00:00.000000006 None
1970-01-01 00:00:00.000000007 None
In [6]:
- [ ] Add reproduction as test
Binary search shows this was fixed in #1227, comprehensive nonreg test added in #1978