Lordworms
Lordworms
I'll start with a random dataset here https://registry.opendata.aws/
I have done some basic play with the bitcoin dataset  and also did some profiling with instrument > FYI I think this is more like an Epic that can...
> > And then measure how much time is spent: > > that is very interesting > > > just want to know what is a good start to solving...
> @Lordworms if i recall correctly, the s3 list call is made on every query, and if the number of files is large this can be non-trivial, so if the...
> but starting with the metadata to me would be a good place to start as to my understanding the builtin cache would be better suited for something like object...
> One additional comment, I'm not sure how the existing cache would perform for range queries. For example we a use trie in our implementation. > > Depending on your...
> One additional comment, I'm not sure how the existing cache would perform for range queries. For example we a use trie in our implementation. > > Depending on your...
I have implemented a basic LRU metadata cache, and I think just caching the metadata would get slight performance improvement(we call the List_Object API just once but call the Get_Object...
> @Lordworms thanks for the work on this. > > Just to confirm - what was the improvement in milliseconds we saw from the object meta cache? For context, in...
> Thats interesting that dashmap performed better. Would you mind also doing a query with a filter / pruning involved and comparing the results (perhaps with a range scan as...