asfimport issues

Results 328 issues of


                                            asfimport

Upgrade merge script to run on python3

It looks like the the parquet-java merge script is setup to run only on python2 which is EOL. We should update it to run on python3 I plan to do...

Component: Java

Component: Parquet

Priority: Major

Type: enhancement

Add support for repeated columns in the filter2 API

They currently are not supported. They would need their own set of operators, like contains() and size() etc. **Reporter**: [Alex Levenson](https://issues.apache.org/jira/secure/ViewProfile.jspa?name=alexlevenson) / @isnotinvain #### PRs and other links: - [GitHub...

Component: Java

Component: Parquet

Priority: Minor

Type: enhancement

Column resolution by ID

Parquet relies on the name. In a lot of usages e.g. schema resolution, this would be a problem. Iceberg uses ID and stored Id/name mappings. This Jira is to add...

Component: Java

Component: Parquet

Priority: Major

Type: enhancement

ParquetWriter::close sometimes fail

we sometimes run into an exception when closing a ParquetWriter instance: ```java 2024-06-10 10:44:01.398 org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource 2024-06-10 10:44:01.398 at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85) 2024-06-10 10:44:01.398 at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94) 2024-06-10 10:44:01.398 at...

Component: Java

Component: Parquet

Priority: Major

Type: bug

Add usage documentation for the Java library

The Java parquet library has no usage documentation besides the sparse information available in the README. The only thing I could find were a few old (10yr) 3rd party tutorials...

Component: Java

Component: Parquet

Priority: Major

Type: enhancement

Bogus file offset for ColumnMetaData written to ColumnChunk metadata of single parquet files

In an effort to understand the parquet format better, I've so far written my own Thrift parser, and upon examining the output, I noticed something peculiar. To begin with, check...

Component: Java

Component: Parquet

Priority: Major

Type: bug

AvroParquetWriter write to s3 bucket throws data intergrity exception

Hi, we are trying to use [org.apache.parquet.avro](https://www.tabnine.com/code/java/packages/org.apache.parquet.avro).AvroParquetWriter to write parquet file to s3 bucket. The file is successfully written to s3 bucket but get an exception com.amazonaws.SdkClientException: Unable to verify...

Component: Parquet

Priority: Major

Type: bug

Unable to create bloomfilter when writing duplicates values for a field

I'm unable to create a bloomfilter for a field, when I perform writes with repeating values. The bloomfilter returned is null when I try to read such a parquet file....

Component: Java

Component: Parquet

Priority: Critical

Type: bug

Files opened by ParquetFileWriter’s appendFile method should to be closed correctly

I try to use the `appendFile` method of `ParquetFileWriter` to merge some smaller Parquet files into one large parquet file. After I finished the merge, I tried deleting the smaller...

Component: Hadoop

Component: Parquet

Priority: Major

Type: bug

Support lazy materialization of row groups in ParquetFileReader

Motivation: The current behavior of ParquetFilterReader#readNextRowGroup is to eagerly enumerate all chunks in the row group, then read all pages in the chunk. For distributed data workloads, this can cause...

Component: Parquet

Priority: Major

Type: enhancement