Lordworms

Results 51 comments of Lordworms

I'll start with a random dataset here https://registry.opendata.aws/

I have done some basic play with the bitcoin dataset ![17347f2f94015d8396ec20a0817a6f09](https://github.com/apache/arrow-datafusion/assets/48054792/84beed34-f1f9-4f3f-b485-c7a312a9778f) and also did some profiling with instrument > FYI I think this is more like an Epic that can...

> > And then measure how much time is spent: > > that is very interesting > > > just want to know what is a good start to solving...

> @Lordworms if i recall correctly, the s3 list call is made on every query, and if the number of files is large this can be non-trivial, so if the...

> but starting with the metadata to me would be a good place to start as to my understanding the builtin cache would be better suited for something like object...

> One additional comment, I'm not sure how the existing cache would perform for range queries. For example we a use trie in our implementation. > > Depending on your...

> One additional comment, I'm not sure how the existing cache would perform for range queries. For example we a use trie in our implementation. > > Depending on your...

I have implemented a basic LRU metadata cache, and I think just caching the metadata would get slight performance improvement(we call the List_Object API just once but call the Get_Object...

> @Lordworms thanks for the work on this. > > Just to confirm - what was the improvement in milliseconds we saw from the object meta cache? For context, in...

> Thats interesting that dashmap performed better. Would you mind also doing a query with a filter / pruning involved and comparing the results (perhaps with a range scan as...