Antoine Pitrou comments

Results 823 comments of


                                            Antoine Pitrou

GH-25118: [Python] Make NumPy an optional runtime dependency

I'm really -1 on the current testing approach where everything is disabled by default and tests have to be whitelisted **twice** to be executed.

GH-25118: [Python] Make NumPy an optional runtime dependency

@github-actions crossbow submit -g wheel -g python

[Python] pa.array raises for mixed scalar types (float16 + int)

This bug can still be reproduced. @raulcd @AlenkaF

Parquet compatibility / integration testing

> Note that Arrow is somewhat different than Parquet in that most of the Arrow implementations are maintained by the Apache Arrow project itself. In comparison, I believe most of...

Parquet compatibility / integration testing

@alkis > Cons: > > * `carpenter` has a bit of complexity - it needs to be able to decode a subset of parquet to verify equivalence > * drivers...

Parquet compatibility / integration testing

> Second best would be option 3, but I'm curious how often an implementation would be expected to provide files? The full set for each release, or just one for...

Parquet compatibility / integration testing

> > Need to host all important implementations under a single CI job (including closed-source ones? including GPU ones?). > > This is a good point. Does it apply to...

GH-46818: [Docs][C++] Add missing method description in type.h

@zanmato1984 In case you want to chime in.

[C++][Parquet] Integer dictionary bitwidth preservation breaks multi-file read behaviour in pyarrow 20

Normally, dataset tries to normalize schemas when reading the files in a dataset. Apparently that doesn't work for dictionary types, we should fix this.

[C++][Parquet] Integer dictionary bitwidth preservation breaks multi-file read behaviour in pyarrow 20

> Or are you talking about the call to `UnifySchemas` in `DatasetFactory::Inspect`? It should be this, indeed. > It seems like doing it this way in Python is currently impossible...