Rajesh Mahindra

Results 12 comments of Rajesh Mahindra

@ankitchandnani I was able to reproduce the error. It seems to be due to an older version of spark-avro (3.0.1 that's used with Hudi 0.9) that doesn't seem to support...

Was able to reproduce by adding the following line in my source: newDataSet = newDataSet.withColumn("invalidDates", functions.lit("1000-01-11").cast(DataTypes.DateType)); Full stacktrace here: https://gist.github.com/rmahindra123/4ab3614ef6ce30ee2c72499f2633de57

Confirmed that #6352 resolves the issue after adding the following config: --conf spark.sql.avro.datetimeRebaseModeInWrite=LEGACY

@haripriyarhp the current kafka connector only supports inserts and not updates. Could you clarify the below comment: >> Later sent 100 more new messages + some updates + some duplicates...

@haripriyarhp Can you verify that all .log files were compacted to parquet files after you ran compaction? Want to make sure records are not in the log files when querying

Also, could you run a parquet query to see if all records are in hudi? It would be helpful if you could share your .hoodie folder here. Thanks

@ROOBALJINDAL When using DebeziumSource, please do not set the --schemaprovider-class, since the schema is applied in the source. Can you try after removing that config? I see that you did...

For Multitable Deltastreamer, it runs the ingestion sequentially, so it will first ingest table1 and then table2. Let me know if you still are facing issues.

@XuQianJin-Stars Can you elaborate more on the setup and environment and steps to reproduce?