Nic Crane

Results 154 comments of Nic Crane

Here's the query plan (the dataset has a lot of columns): ``` ExecPlan with 3 nodes: 2:SinkNode{} 1:ProjectNode{projection=[SPORDER, RT, SERIALNO, PUMA, ST, ADJUST, PWGTP, AGEP, CIT, COW, DDRS, DEYE, DOUT,...

Hmm, could be an R bug or something already solved actually; I ran the following (different query, but similarly problematic in R) with pyarrow: ``` import pyarrow as pa import...

OK, this is the actual plan: ``` ExecPlan with 3 nodes: 2:ConsumingSinkNode{} 1:ProjectNode{projection=[SPORDER, RT, SERIALNO, PUMA, ST, ADJUST, PWGTP, AGEP, CIT, COW, DDRS, DEYE, DOUT, DPHY, DREM, DWRK, ENG, FER,...

It would be nice if the ConsumingSinkNode printed the values of the WriteNodeOptions so we could compare with pyarrow. But glancing at the defaults, they look the same (more or...

> How were you measuring RAM? Were you looking at the RSS of the process? Or were you looking at the amount of free/available memory? I was just looking at...

Thanks! And when you say "increase without bound", how would I know that's happening?

OK, so I've been experimenting with various combinations of this, and have found that it happens with both Python and R, so looks like a C++ issue. I'm running this...

I think we might be rehashing some of the conversation already had a long time ago in https://github.com/apache/arrow/issues/18944#issuecomment-1377665189

I tried it with `mta_tax` which has 385 distinct values, and it also crashes. But I'd expect that, seeing as the data isn't already partitioned on that variable and it'd...