Adam Gutglick

Results 50 comments of Adam Gutglick

For datafusion specifically, it has an optional memory back pressure system we can opt into, but what's the root cause here?

Is this with the most recent release?

Sorry I got buried in other issues. I checked now and I don't see any new data, do you see this behavior with the files you sent us for #4466?

I think @onursatici is looking into this and similar issues now, it should be better by the next release.

There are defiantly a bunch of good improvements related to DataFusion and to wider file schemas. I'm not sure we got to the root if this issue yet, but I...

Just to expand on the context - I would love to support datafusion dynamic expressions, but they are extremely flexible. They probably don't fit with our current dynamic expression implementation...

It was just a failing benchmark, @joseph-isaacs noted that it might make sense to split up some of the benchmarks so they can make progress independently.

Was going to add tests, but than I ran into [this](https://github.com/apache/datafusion/issues/18513) issue which took me a while to repro.

Yeah it's basically me wanting to dive into the benchmarks. There are some cases here that I expected it to make more of a difference, and obviously some regressions too....

We've also ran into `ChunkedArray::take` being really bad in the "shuffle" case where the indices aren't sorted, potentially creating `O(len(take_array))` chunks.