hudi
hudi copied to clipboard
[SUPPORT]HoodieException: Commit 20220509105215 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:492)
Hi Team, our hudi jobs are failing with below exception. Can you please help me on this.
HoodieException: Commit 20220509105215 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:492)
Environment Description
-
Hudi version :0.8.0
-
Spark version :3.2.1
-
Storage (HDFS/S3/GCS..) : AZURE blod storage
we are running the apache hudi spark in azure databricks . using below dependancies .
org.apache.spark:spark-avro_2.12:3.1.1 org.apache.hudi:hudi-utilities-bundle_2.12:0.8.0 org.apache.hudi:hudi-spark3-bundle_2.12:0.8.0
Can you share the entire stacktrace?
@pratyakshsharma HI what is stacktrace ?, i am not aware of it ,can you please help with more details.
@nleena123 I meant to say please share the detailed error log and not just the exception message. :)
@nsivabalan Can you help us , this is happing in our production server.
can you furnish the write configs you use to write to hudi.
#hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator hoodie.datasource.write.hive_styling_partioning=true hoodie.datasource.write.partitionpath.urlencode=false
#hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator hoodie.datasource.write.recordkey.field=interaction_id hoodie.datasource.write.partitionpath.field=create_datetime:TIMESTAMP hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyyMMdd HH:mm hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM hoodie.deltastreamer.keygen.timebased.timezone=UTC-6:00 hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator #20211119 13:39 #hoodie.index.type=GLOBAL_BLOOM #hoodie.bloom.index.update.partition.path=true
schema provider configs
#hoodie.deltastreamer.schemaprovider.registry.url=https://azure-prod-schema-registry.extscloud.com/subjects/async.messaging.interaction.metrics-value/versions/3 hoodie.deltastreamer.schemaprovider.registry.url=https://azure-prod-schema-registry.extscloud.com/subjects/async.messaging.interaction.metrics-value/versions/latest
Kafka Source
hoodie.deltastreamer.source.kafka.topic=async.messaging.interaction.metrics #Kafka props #hoodie.auto.commit=true enable.auto.commit=true #hoodie.deltastreamer.kafka.source.maxEvents=1000000 group.id=async.messaging.metrics bootstrap.servers=prod-hcc-kaas-central-13.centralus.cloudapp.azure.com:443 #metadata.broker.list=kaas-prod-elr-a.optum.com:443 auto.offset.reset=latest auto.reset.offsets=latest #schema.registry.url=https://azure-test-schema-registry.extscloud.com schema.registry.url=https://azure-prod-schema-registry.extscloud.com:443 schema.registry.ssl.keystore.location=/dbfs/mnt/ulp/artifacts/topics/certs/keystore.jks schema.registry.ssl.keystore.password=prmcert schema.registry.ssl.truststore.location=/dbfs/mnt/ulp/artifacts/topics/certs/truststore.jks schema.registry.ssl.truststore.password=prmcert schema.registry.ssl.key.password=prmcert security.protocol=SSL sasl.mechanism=PLAIN ssl.keystore.location=/dbfs/mnt/ulp/artifacts/topics/certs/keystore.jks ssl.keystore.password=******* ssl.truststore.location=/dbfs/mnt/ulp/artifacts/topics/certs/truststore.jks ssl.truststore.password=********** ssl.key.password=prmcert
Sorry for being late here. I see the class DFAvroKafkaSource
in logs. Are you having your own custom class for reading the data?
yes , below DFAvroKafkaSource.java file we are using to read the data which extends AvroSource. DF3HoodieDeltaStreamer.java which extends HoodieDeltaStreamer , and attached pom.xml file code.zip in the folder please have a look on it and let me know ,if any changes required in my code .
and can you share your deltastreamer configs please?
are you trying out for the first time and running into issues. or was this running ok for sometime and all of a sudden you hit this when you upgraded or changed any configs?
Hi @nsivabalan Below attached property file contain all configs that we used to This job. And used passed below arguments to databrick job (we are running hudi job through Azure databricks)
["--table-type","COPY_ON_WRITE","--source-ordering-field","CDC_TS","--source-class","com.optum.df.hudi.sources.DFAvroKafkaSource","--target-base-path","/mnt/ulp/dataassets-lake/metrics/","--target-table","metrics","--schemaprovider-class","org.apache.hudi.utilities.schema.SchemaRegistryProvider","--props","/mnt/ulp/artifacts/properties/metrics.properties"]
Note:- No we didn't change any configs .
I see hoodie.datasource.write.hive_styling_partioning
should be changed to hoodie.datasource.write.hive_style_partitioning
.
Looking at your timeline, 20211211183554__commit__COMPLETED
is the last time the job ran successfully, after that you have been running into issues. So we want to understand if you changed anything since this last successful commit?
@pratyakshsharma
i have changed to hoodie.datasource.write.hive_style_partitioning=true and i ran the job , still i could see the same issue. i have not made any changes after 20211211183554__commit__COMPLETED
Yes, changing to hoodie.datasource.write.hive_style_partitioning
will not fix your issue. That was just a typo in your configs (unrelated to this error), so I just highlighted this.
Also I guess there is some confusion with respect to the code that you shared and the logs that you shared. In the code, I see multiple logs getting printed which are not there in the logs file you shared. Also the schema getting registered in the logs is not having CDC_TS field. Can you share latest logs? That might help in debugging the issue better.
We are adding new field CDC_TS to Kafka data through code which present in DF3HoodieDeltaStreamer.java program. Code :- static GenericRecord getDataWithCDC_TS(GenericRecord record, String ts) {
Schema base = record.getSchema();
List<Schema.Field> baseFields = base.getFields().stream()
.map(field -> {
Schema.Field f = new Schema.Field(field.name(), field.schema(), field.doc(), field.defaultVal());
field.aliases().forEach(f::addAlias);
return f;
})
.collect(Collectors.toList());
List<Schema.Field> newFields = new ArrayList<>(baseFields);
ArrayList<Schema> optionalString = new ArrayList<>();
optionalString.add(Schema.create(Schema.Type.NULL));
optionalString.add(Schema.create(Schema.Type.STRING));
//JsonNode nullDefault = JsonNodeFactory.instance.nullNode();
//Schema.Field f = new Schema.Field(CDC_TS, Schema.createUnion(optionalString), CDC_TS, nullDefault);
Schema.Field f = new Schema.Field(CDC_TS, Schema.createUnion(optionalString), CDC_TS, null);
if (!baseFields.contains(f))
newFields.add(f);
Schema newSchema = Schema.createRecord(
base.getName(),
"New schema by adding a new field,CDC_TS",
base.getNamespace(), base.isError(),
newFields);
LOG.info("newSchema is:" + newSchema);
GenericRecord newRecord = new GenericData.Record(newSchema);
base.getFields().stream().forEach(field -> newRecord.put(field.name(), record.get(field.name())));
newRecord.put(CDC_TS, ts);
return newRecord;
}
i have attached Latest log file , please have look on it. log4j-active_latest.txt
I have the same problem, and I haven't solved it yet
Hi @xushiyan , @pratyakshsharma From very long back , i am seeing this issue , Can you please update on this , or please suggest me the resolution steps ?
Hi Team, Can you please update me on the above issue ??
HI @qianchutao , Are you able to fix the issue ? and can you please let me know which version of hudi are you using ?
I am also facing the similar exception, using EMR on AWS with kafka. Even exception is not self explanatory if any config is missing or what is exactly wrong
Exception:
22/08/09 07:30:39 INFO TaskSetManager: Finished task 0.0 in stage 66.0 (TID 2033) in 113 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (1/2) 22/08/09 07:30:39 INFO TaskSetManager: Finished task 1.0 in stage 66.0 (TID 2034) in 140 ms on ip-10-151-46-136.us-west-2.compute.internal (executor 1) (2/2) 22/08/09 07:30:39 INFO YarnScheduler: Removed TaskSet 66.0, whose tasks have all completed, from pool 22/08/09 07:30:39 INFO DAGScheduler: ResultStage 66 (collectAsMap at HoodieSparkEngineContext.java:151) finished in 0.151 s 22/08/09 07:30:39 INFO DAGScheduler: Job 32 is finished. Cancelling potential speculative or zombie tasks for this job 22/08/09 07:30:39 INFO YarnScheduler: Killing all running tasks in stage 66: Stage finished 22/08/09 07:30:39 INFO DAGScheduler: Job 32 finished: collectAsMap at HoodieSparkEngineContext.java:151, took 0.153794 s 22/08/09 07:30:39 ERROR HoodieDeltaStreamer: Shutting down delta-sync due to exception org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) 22/08/09 07:30:39 ERROR HoodieAsyncService: Service shutdown with error java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$1(HoodieDeltaStreamer.java:189) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:186) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.sync(HoodieMultiTableDeltaStreamer.java:416) at org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer.main(HoodieMultiTableDeltaStreamer.java:247) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1000) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1089) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1098) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:709) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieException: Commit 20220809073005189 failed and rolled-back ! at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:649) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:675)
HI @qianchutao , Are you able to fix the issue ? and can you please let me know which version of hudi are you using ?
Were you able to fix this issue?
@nleena123 @ROOBALJINDAL : looks like the stacktrace does not have sufficient info to triage. So, can you folks enable debug logs and share what you see. We should probably get some insights into whats happening. Also, if we can somehow get a reproducible code, would really help us triage it faster. Since timeline events are very tough to grok manually looking at the logs.
@nsivabalan @pratyakshsharma I tried enabling debug logs by changing root log4j.rootLogger=INFO
to log4j.rootLogger=DEBUG
in Hudi jar for following files
log4j.properties log4j-surefire.properties log4j-surefire-quiet.properties
Also updated some other properties in these files:
log4j.rootLogger=DEBUG, CONSOLE
log4j.logger.org.apache=DEBUG
log4j.logger.org.apache.hudi=DEBUG
I am using this custom hudi jar by following spark command:
spark-submit \
--class org.apache.hudi.utilities.deltastreamer.HoodieMultiTableDeltaStreamer s3://slava-redshift-test/hudi/roobal-hudi-Debug-logger/hudi-utilities-bundle_2.12-0.11.0_edfx_debug.jar \
I have also tried this:
spark-submit \
--conf spark.executor.extraJavaOptions=-Dlog4j.debug=true \
Can you help what I am missing?
Note: I am using EMR cluster on AWS.
log4j.rootCategory=WARN, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Set the default spark-shell log level to WARN. When running the spark-shell, the
# log level for this class is used to overwrite the root logger's log level, so that
# the user can have different defaults for the shell and regular Spark apps.
log4j.logger.org.apache.spark.repl.Main=WARN
# Settings to quiet third party logs that are too verbose
log4j.logger.org.spark-project.jetty=WARN
log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
# Example
log4j.logger.org.apache.hudi=DEBUG
sample log4j file.
--conf spark.driver.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties" --conf spark.executor.extraJavaOptions="-Dlog4j.configuration=file:/home/hadoop/log4j.properties"
@ROOBALJINDAL : can we have some updates.
@nsivabalan yeah, issue worked for me. I logged separate issue and I was able to get help and resolve. Thank you team 👍 and fyi.. this original issue is logged by someone else so I cant close this issue.
oh ok ok, got it. thanks. may I know which issue you are referring to.
@nleena123 : are you still facing issues or are we good to close it out ?