asfimport
asfimport
ParquetMR contains a suite of self-tests. When one of those self-tests fails, it would be nice to be able to pull up the test in an IDE like IntelliJ. Then...
We have a lack of proper integration tests between components. Fortunately, we already have a git repository to upload test data: https://github.com/apache/parquet-testing. The idea is the following. Create a directory...
@pmouawad ([Bug 63456](https://bz.apache.org/bugzilla//show_bug.cgi?id=63456&redirect=false)): Hello, This could be a good start for future HTTP2 support. Regards OS: All
For consistency with S3FileSystem and others. See discussion at https://github.com/apache/arrow/pull/13404#discussion_r901799543 **Reporter**: [Neal Richardson](https://issues.apache.org/jira/browse/ARROW-16884) / @nealrichardson **Note**: *This issue was originally created as [ARROW-16884](https://issues.apache.org/jira/browse/ARROW-16884). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further...
It's not possible to open a ``abfs://`` or `abfss://` URI with the pyarrow.fs.HadoopFileSystem. Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an error saying that the authority is invalid...
Currently, when writing a dataset, e.g. from a table consisting of a set of record batches, there is no guarantee that the row order is preserved when reading the dataset....
Test runs of parquet-hadoop with `-Dhadoop.version=3.4.0` fail because there's a logback jar on the classpath, which screws things up (mostly seemingly because it suddenly logs at debug) HADOOP-19084 should have...
@rdblue pointed me to which provides non-native implementations of compression codecs. It claims to be much faster than native wrappers that parquet uses. This Jira is to track the work...
Parquet MR 1.8.2 does not support reading row groups which are larger than 2 GB. See:https://github.com/apache/parquet-mr/blob/parquet-1.8.x/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L1064 We are seeing this when writing skewed records. This throws off the estimation of...
command result as follow: parquet-tools bloom-filter BloomFilter.snappy.parquet row-group 0: bloom filter for column id: NONE bloom filter for column uuid: Hash strategy: block Algorithm: block Compression: uncompressed Bitset size: 1048576...