Jedr Blaszyk
Jedr Blaszyk
That looks very promising! Great that we have some handy benchmarks ready https://github.com/tarekziade/bench-db! I'm checking how would rocksDB perform against them, [Flink uses rocksDB to store internal state](https://flink.apache.org/2021/01/18/using-rocksdb-state-backend-in-apache-flink-when-and-how/) EDIT: `python-rocksdb`...
From #748 > I benched dbm and sqlite3 for 1 Million docs -- dbm wins see https://github.com/tarekziade/bench-db I forked the bench repo and change the benchmark scenario from 1Mil docs...
Another option: https://github.com/elastic/search-team/issues/4712 ## Scaling full sync with delete by query > For a full sync, instead of pre-loading all existing documents, we could add a timestamp field to indicate...
> Main concern is disk cleanup cause we can just drop down a host if we overuse disk and fail to clean up. Agreed, we would need to establish a...
@seanstory, following up on our discussion yesterday in chat. We can skip the document lookup/database altogether if we add a timestamp field to indicate the index time of each document....
> We can actually use cursors for it, it's already available and from first glance look fit for our goal (cursors are updated after successful sync). ++ yeah, cursors could...
> I would personally not rely on timestamps and maybe add a metafield that refers document to a sync id that it was part of? Then we delete all docs...
> Sorry I'm a bit confused - do you also mean dropping timestamp from individual documents? No, those doc timestamps are still in the index. But without the local storage...
In the team sync we agreed on using disk-based lookup for self-managed connectors (managed in `config.yml`, same as with extraction service) as it's should be relatively easy to implement.
@artem-shelkovnikov When I run make install, my laptop defaults to the latest version of Python 3, due to this line in the Makefile: https://github.com/elastic/connectors/blob/main/Makefile#L3 With Python 3.12, dependencies install fine,...