Antoine Pitrou
Antoine Pitrou
I'm really -1 on the current testing approach where everything is disabled by default and tests have to be whitelisted **twice** to be executed.
@github-actions crossbow submit -g wheel -g python
This bug can still be reproduced. @raulcd @AlenkaF
> Note that Arrow is somewhat different than Parquet in that most of the Arrow implementations are maintained by the Apache Arrow project itself. In comparison, I believe most of...
@alkis > Cons: > > * `carpenter` has a bit of complexity - it needs to be able to decode a subset of parquet to verify equivalence > * drivers...
> Second best would be option 3, but I'm curious how often an implementation would be expected to provide files? The full set for each release, or just one for...
> > Need to host all important implementations under a single CI job (including closed-source ones? including GPU ones?). > > This is a good point. Does it apply to...
@zanmato1984 In case you want to chime in.
Normally, dataset tries to normalize schemas when reading the files in a dataset. Apparently that doesn't work for dictionary types, we should fix this.
> Or are you talking about the call to `UnifySchemas` in `DatasetFactory::Inspect`? It should be this, indeed. > It seems like doing it this way in Python is currently impossible...