hudi icon indicating copy to clipboard operation
hudi copied to clipboard

Upserts, Deletes And Incremental Processing on Big Data.

Results 906 hudi issues
Sort by recently updated
recently updated
newest added

Running Deltastreamer with Cloudwatch Metrics isn't shutting down properly. This is in NON continous mode. DeltaSync and spark context say they are closing, but the JVM is not exiting, everything...

priority:critical
deltastreamer

**Summary** During clustering, Hudi creates duplicate parquet file with the same file group ID and identical content. One of the two files are later marked as a duplicate and deleted....

priority:critical
table-service

### Describe the problem I'm using a Spark job running on EMR to insert data using hudi (0.9.0). The inserts are working as expected and it stores parquet files in...

meta-sync
priority:critical
spark

Hudi version: 0.11.1 Spark version: 3.1.1 Storage: S3 AWS Glue: 3 Function ```scala import org.apache.spark.sql.{functions => fn} def readAndShow(path: String) { val df = spark.read.format("hudi").load(path) df.select(fn.min(fn.col("updated_at")), fn.min(fn.col("_hoodie_commit_time"))) show false val...

priority:major
spark
reader-core
incremental-query

**Describe the problem you faced** Hudi cli got empty result after running command show fsview all. ![image](https://user-images.githubusercontent.com/7007327/180346750-6a55f472-45ac-46cf-8185-3c4fc4c76434.png) The type of table t1 is COW and I am sure that the...

priority:minor
cli

**Describe the problem you faced** A clear and concise description of the problem. Upgrading to 0.11.1 , the deltastreamer is failing to write to a 6GB bucket. It is failing...

performance
priority:major
writer-core
on-call-triaged

**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you...

priority:critical
writer-core
table-service

**Describe the problem you faced** Our Hudi data lake is heavily partitioned by datasource, year, and month. We have 1000 datasources currently loaded into the lake, and are looking to...

priority:major
metadata

**Describe the problem you faced** I'm using Hudi Delta streamer in continuous mode with Kafka source. Whenever Kafka offset got expired the job will fail with offset out of range...

priority:critical
deltastreamer
pre-0.10.0

**_Tips before filing an issue_** - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)? - Join the mailing list to engage in conversations and get faster support at [email protected]. - If you...

priority:critical
deltastreamer
pre-0.10.0