seatunnel
seatunnel copied to clipboard
[Bug] [Zeta] savePointJob Doesn't work
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
I using SeaTunnel Client and wanna to check out the function. I launched a job by ./bin/seatunnel.sh -c /Users/liu/Data/10_Code/iWhalecloud/seatunnel-web/profile/13244296710144.conf
and after that, launched a job by ./bin/seatunnel.sh -s 834694399370723329
. I waitted for a long time, the submitted job still running, even all my data has been moved.
SeaTunnel Version
2.3.4
SeaTunnel Config
{
"env" : {
"job.mode" : "BATCH",
"job.name" : "SeaTunnel_Job"
},
"source" : [
{
"password" : "wdp123",
"driver" : "oracle.jdbc.driver.OracleDriver",
"parallelism" : "32",
"query" : "SELECT \"ID\", \"NAME\" FROM \"WHS\".\"TEST3\"",
"connection_check_timeout_sec" : 30,
"fetch_size" : "10000",
"result_table_name" : "Table13244444398848",
"plugin_name" : "Jdbc",
"user" : "system",
"url" : "jdbc:oracle:thin:@10.45.46.116:8085:XE"
}
],
"transform" : [],
"sink" : [
{
"batch_size" : "10000",
"max_retries" : "1",
"source_table_name" : "Table13244444398848",
"max_commit_attempts" : 3,
"auto_commit" : "true",
"plugin_name" : "Clickhouse",
"url" : "jdbc:clickhouse://10.45.151.152:8123",
"is_exactly_once" : "false",
"database" : "AA",
"password" : "Pass-123-whs",
"transaction_timeout_sec" : -1,
"driver" : "ru.yandex.clickhouse.ClickHouseDriver",
"support_upsert_by_query_primary_key_exist" : "false",
"Clickhouse" : "true",
"host" : "10.45.151.152:8123",
"connection_check_timeout_sec" : 30,
"generate_sink_sql" : "true",
"user" : "default",
"table" : "tb_test3",
"username" : "default"
}
]
}
Running Command
1. ./bin/seatunnel.sh -c /Users/liu/Data/10_Code/iWhalecloud/seatunnel-web/profile/13244296710144.conf
2. ./bin/seatunnel.sh -s 834694399370723329
Error Exception
savePointJob doesn't work.
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
I found that there is a lock competition in org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle#triggerBarrier
. When the savepoint barrier run in the synchronized (collector.getCheckpointLock())
, it will get the checkpoint lock utill all record has been collected.
Can you give a detailed log of the zeta engine?
Can you give a detailed log of the zeta engine?
It seems like not the lock competition, but the org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next()
always be true
. So the lock will not give always utill all ResultSet record has been collected.
Can you give a detailed log of the zeta engine?
It seems like not the lock competition, but the
org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next()
always betrue
. So the lock will not give always utill all ResultSet record has been collected.
Yes, I'm not sure if your source table has a primary key because you didn't set the partition column, which may result in only one split being present, and then savepoint is waiting for the currently split executing. So we need to observe some key information from the zeta engine logs.
Can you give a detailed log of the zeta engine?
It seems like not the lock competition, but the
org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next()
always betrue
. So the lock will not give always utill all ResultSet record has been collected.Yes, I'm not sure if your source table has a primary key because you didn't set the partition column, which may result in only one split being present, and then savepoint is waiting for the currently split executing. So we need to observe some key information from the zeta engine logs.
Yes, it just a one giant split, because I didn't set any special conf, and table just two field: ID and NAME, with no any keys.
Should it be setten a limitation in case a giant split holding the checkpoint lock for a long time ? @hailin0 @Hisoka-X
The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.
The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.
So there is no way to make a savepoint or checkpoint when a Table without any Keys but with a huge records?
The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.
So there is no way to make a savepoint or checkpoint when a Table without any Keys but with a huge records?
Just cancel it. Restore can not do anything even you got the right state file.