asfimport issues

Results 328 issues of


                                            asfimport

Can't run ParquetMR tests in IDEs

ParquetMR contains a suite of self-tests. When one of those self-tests fails, it would be nice to be able to pull up the test in an IDE like IntelliJ. Then...

Component: Parquet

Component: Testing

Priority: Blocker

Type: bug

Improve integration tests between implementations

We have a lack of proper integration tests between components. Fortunately, we already have a git repository to upload test data: https://github.com/apache/parquet-testing. The idea is the following. Create a directory...

Component: Parquet

Component: Testing

Priority: Major

Type: test

HTTP Request: Migrate to HTTPClient 5 APIs

@pmouawad ([Bug 63456](https://bz.apache.org/bugzilla//show_bug.cgi?id=63456&redirect=false)): Hello, This could be a good start for future HTTP2 support. Regards OS: All

enhancement

os: All

[C++] GcsFileSystem::Make should return Result

For consistency with S3FileSystem and others. See discussion at https://github.com/apache/arrow/pull/13404#discussion_r901799543 **Reporter**: [Neal Richardson](https://issues.apache.org/jira/browse/ARROW-16884) / @nealrichardson **Note**: *This issue was originally created as [ARROW-16884](https://issues.apache.org/jira/browse/ARROW-16884). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further...

Type: enhancement

Component: C++

good-first-issue

[Python] pyarrow.fs.HadoopFileSystem cannot access Azure Data Lake (ADLS)

It's not possible to open a ``abfs://`` or `abfss://` URI with the pyarrow.fs.HadoopFileSystem. Using HadoopFileSystem.from_uri(path) does not work and libhdfs will throw an error saying that the authority is invalid...

Type: bug

Component: Python

[C++][Dataset] Preserve order when writing dataset

Currently, when writing a dataset, e.g. from a table consisting of a set of record batches, there is no guarantee that the row order is preserved when reading the dataset....

Type: enhancement

Component: C++

parquet-hadoop tests to work with hadoop 3.4.0

Test runs of parquet-hadoop with `-Dhadoop.version=3.4.0` fail because there's a logback jar on the classpath, which screws things up (mostly seemingly because it suddenly logs at debug) HADOOP-19084 should have...

Component: Hadoop

Component: Parquet

Priority: Major

Type: bug

Use airlift non-native implementations for GZIP, LZ0 and LZ4 codecs

@rdblue pointed me to which provides non-native implementations of compression codecs. It claims to be much faster than native wrappers that parquet uses. This Jira is to track the work...

Component: Parquet

Priority: Major

Type: enhancement

Cannot read row group larger than 2GB

Parquet MR 1.8.2 does not support reading row groups which are larger than 2 GB. See:https://github.com/apache/parquet-mr/blob/parquet-1.8.x/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetFileReader.java#L1064 We are seeing this when writing skewed records. This throws off the estimation of...

Component: Java

Component: Parquet

Priority: Major

Type: bug

Column bloom filter: Show bloom filters in tools

command result as follow: parquet-tools bloom-filter BloomFilter.snappy.parquet row-group 0: bloom filter for column id: NONE bloom filter for column uuid: Hash strategy: block Algorithm: block Compression: uncompressed Bitset size: 1048576...

Component: Parquet

Priority: Minor

Type: enhancement