ArcticDB icon indicating copy to clipboard operation
ArcticDB copied to clipboard

Poor performance when reading as_of a date with many early versions deleted

Open DrNickClarke opened this issue 11 months ago • 1 comments

Describe the bug

There is a feature called tombstone all that is supposed to prevent version search having to walk the entire historical version list when the early versions have all been deleted.

It works for as_of = a version number (it will return quickly not having found the version)

However when as_of = a date is used it can be slow. This is much more apparent using AWS where the latency is higher.

When there are thousands of versions the read can take several minutes.

Steps/Code to Reproduce

import arcticdb as adb import pandas as pd import numpy as np from datetime import datetime, timedelta

arctic = adb.Arctic(<AWS S3 uri>) lib = arctic.get_library('adb_bugs', create_if_missing=True)

N = 3 df = pd.DataFrame( index=pd.date_range("20240101", periods=N), data={'col': np.arange(0., N)} )

write 500 versions

sym1 = 'asof_slow_read' for i in range(500): lib.write(sym1, df)

remove early versions

lib.delete(sym1)

add one more version

lib.write(sym1, df)

this is slow (12s in my test)

as_of = datetime.now() - timedelta(days=1) lib.read(sym1, as_of=as_of)

this is fast (171ms in my test)

lib.read(sym1, as_of=499)

Expected Results

Results are as expected. This is a performance issue.

OS, Python Version and ArcticDB Version

Python 3.10 Linux Linux version 5.15.133.1-microsoft-standard-WSL2 arcticdb 4.3.1

Backend storage used

AWS S3

Additional Context

This is possibly related to this issue (failure to observe tombstone correctly). It may be easier to solve the two together

https://github.com/man-group/ArcticDB/issues/1385

DrNickClarke avatar Feb 29 '24 16:02 DrNickClarke

This will be a failure to short-circuit on fast-tombstone all keys, as the logic is a bit more complex than when searching by exact version number.

alexowens90 avatar Mar 13 '24 16:03 alexowens90