Rajesh Mahindra comments

Results 12 comments of


                                            Rajesh Mahindra

[SUPPORT] Hudi Deltastreamer CSV ingestion issue

@ankitchandnani I was able to reproduce the error. It seems to be due to an older version of spark-avro (3.0.1 that's used with Hudi 0.9) that doesn't seem to support...

[SUPPORT] Deltastreamer fails with data and timestamp related exception after upgrading to EMR 6.5 and spark3

Was able to reproduce by adding the following line in my source: newDataSet = newDataSet.withColumn("invalidDates", functions.lit("1000-01-11").cast(DataTypes.DateType)); Full stacktrace here: https://gist.github.com/rmahindra123/4ab3614ef6ce30ee2c72499f2633de57

[SUPPORT] Deltastreamer fails with data and timestamp related exception after upgrading to EMR 6.5 and spark3

Confirmed that #6352 resolves the issue after adding the following config: --conf spark.sql.avro.datetimeRebaseModeInWrite=LEGACY

[SUPPORT] Missing records when using Kafka Hudi sink to write to S3.

@haripriyarhp the current kafka connector only supports inserts and not updates. Could you clarify the below comment: >> Later sent 100 more new messages + some updates + some duplicates...

[SUPPORT] Missing records when using Kafka Hudi sink to write to S3.

@haripriyarhp Can you verify that all .log files were compacted to parquet files after you ran compaction? Want to make sure records are not in the log files when querying

[SUPPORT] Missing records when using Kafka Hudi sink to write to S3.

Also, could you run a parquet query to see if all records are in hudi? It would be helpful if you could share your .hoodie folder here. Thanks

[SUPPORT] Hudi error while running HoodieMultiTableDeltaStreamer: Commit 20220809112130103 failed and rolled-back !

@ROOBALJINDAL When using DebeziumSource, please do not set the --schemaprovider-class, since the schema is applied in the source. Can you try after removing that config? I see that you did...

[SUPPORT] Hudi error while running HoodieMultiTableDeltaStreamer: Commit 20220809112130103 failed and rolled-back !

For Multitable Deltastreamer, it runs the ingestion sequentially, so it will first ingest table1 and then table2. Let me know if you still are facing issues.

Exception org.apache.hudi.exception.HoodieIOException: Could not read commit details

@XuQianJin-Stars Can you elaborate more on the setup and environment and steps to reproduce?

[SUPPORT] Hoodie Delta streamer Job with Kafka Source fetching the same offset again and again Commiting the same offset again and again

@ksrihari93 is this still an issue after trying auto.offset.reset=LATEST?