Ben Harkins
Ben Harkins
A couple notes on this: - I decided to hold off on documenting threading/reentrancy characteristics until I could get a second opinion, as this was my first exposure to the...
Might be good to go. I can also confirm that the CSV tests fail without these changes.
> I think it would be nice to first submit the benchmark changes as a separate PR. Will do. Should I keep this one open or convert it to a...
The benchmark PR is now up: https://github.com/apache/arrow/pull/14552.
@pitrou I still have to do the docs, but feel free to bring up any further points on the code in the meantime.
I just opened a small PR that addresses the chunker issue: https://github.com/apache/arrow/pull/14843. If that gets merged, I have a test I can add for the `newlines_in_values = true` case (which...
@pitrou Just pushed a fix. Also, not sure if my rationale on the `std::remove` table validation stuff from yesterday is sound - but if not, I can change that too.
@amol- Yeah, the plan is to open a new PR with a different approach. I'll go ahead and close this.
@anjakefala Agreed that everything seems to be in place. I'll be starting the vote on the ML later today.
Rebased on the latest changes. I compared against master locally and I'm seeing roughly +12/16/25% bytes/sec for 10/100/1000 fields (in the ordered/non-sparse cases).