Antoine Pitrou comments

Results 823 comments of


                                            Antoine Pitrou

PARQUET-758: Add Float16/Half-float logical type

> @shangxinli are there guidelines for what needs to happen to accept this addition? I suppose it needs a discussion and then a formal vote on the ML?

PARQUET-758: Add Float16/Half-float logical type

> It might have missed it but I didn't see Julien's reply on the dev mailing list. This seems reasonable though. For full disclosure, it was a discussion involving the...

PARQUET-758: Add Float16/Half-float logical type

> FWIW, I rather think it should be a physical type for the following reasons: > > * encodings are currently only defined on the physical type, not the logical...

Umbrella issue: Switching from Jira to GitHub Issues

* Issue type: it would be nice if we could keep a distinction between user-visible improvements ("Improvement" or "Feature request") and internal improvements such as refactors ("Task").

Umbrella issue: Switching from Jira to GitHub Issues

Can we also try to migrate the JIRA labels `good-first-issue` and `good-second-issue`? They are useful to mark issues suitable for fledgling contributors.

PARQUET-2204: [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space

Can you merge the latest changes from git master?

PARQUET-2204: [parquet-cpp] TypedColumnReaderImpl::Skip should reuse scratch space

I get the following benchmark numbers: * before: ``` ------------------------------------------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------------------------------------------------- ColumnReaderSkipInt32/Repetition:0/BatchSize:100 2256949 ns 2256574 ns 310 bytes_per_second=2.1131G/s ColumnReaderSkipInt32/Repetition:0/BatchSize:1000 322274 ns 322302 ns 2147 bytes_per_second=14.7947G/s...

ARROW-4709: [C++] Optimize for ordered JSON fields

Right, it seems the speedup is relatively minor. I get these results: ``` -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Non-regressions: (39) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- benchmark baseline contender change % counters ParseJSONFields/ordered:1/schema:1/sparsity:0/num_fields:1000 141.204 MiB/sec 163.772 MiB/sec 15.983 {'family_index':...

ARROW-4709: [C++] Optimize for ordered JSON fields

Thanks. Here are the updated benchmark numbers that I get: ``` -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Non-regressions: (40) -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- benchmark baseline contender change % counters ParseJSONFields/ordered:1/schema:0/sparsity:0/num_fields:1000 137.627 MiB/sec 166.086 MiB/sec 20.679 {'family_index': 5, 'per_family_instance_index':...

ARROW-4709: [C++] Optimize for ordered JSON fields

The AWS-related test failures are unrelated to this PR.