seatunnel [Bug] [Zeta] savePointJob Doesn't work

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

I using SeaTunnel Client and wanna to check out the function. I launched a job by ./bin/seatunnel.sh -c /Users/liu/Data/10_Code/iWhalecloud/seatunnel-web/profile/13244296710144.conf and after that, launched a job by ./bin/seatunnel.sh -s 834694399370723329. I waitted for a long time, the submitted job still running, even all my data has been moved.

SeaTunnel Version

2.3.4

SeaTunnel Config

{
    "env" : {
        "job.mode" : "BATCH",
        "job.name" : "SeaTunnel_Job"
    },
    "source" : [
        {
            "password" : "wdp123",
            "driver" : "oracle.jdbc.driver.OracleDriver",
            "parallelism" : "32",
            "query" : "SELECT \"ID\", \"NAME\" FROM \"WHS\".\"TEST3\"",
            "connection_check_timeout_sec" : 30,
            "fetch_size" : "10000",
            "result_table_name" : "Table13244444398848",
            "plugin_name" : "Jdbc",
            "user" : "system",
            "url" : "jdbc:oracle:thin:@10.45.46.116:8085:XE"
        }
    ],
    "transform" : [],
    "sink" : [
        {
            "batch_size" : "10000",
            "max_retries" : "1",
            "source_table_name" : "Table13244444398848",
            "max_commit_attempts" : 3,
            "auto_commit" : "true",
            "plugin_name" : "Clickhouse",
            "url" : "jdbc:clickhouse://10.45.151.152:8123",
            "is_exactly_once" : "false",
            "database" : "AA",
            "password" : "Pass-123-whs",
            "transaction_timeout_sec" : -1,
            "driver" : "ru.yandex.clickhouse.ClickHouseDriver",
            "support_upsert_by_query_primary_key_exist" : "false",
            "Clickhouse" : "true",
            "host" : "10.45.151.152:8123",
            "connection_check_timeout_sec" : 30,
            "generate_sink_sql" : "true",
            "user" : "default",
            "table" : "tb_test3",
            "username" : "default"
        }
    ]
}

Running Command

1. ./bin/seatunnel.sh -c /Users/liu/Data/10_Code/iWhalecloud/seatunnel-web/profile/13244296710144.conf
2. ./bin/seatunnel.sh -s 834694399370723329

Error Exception

savePointJob doesn't work.

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

[X] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Apr 22 '24 07:04 Jetiaime

I found that there is a lock competition in org.apache.seatunnel.engine.server.task.flow.SourceFlowLifeCycle#triggerBarrier. When the savepoint barrier run in the synchronized (collector.getCheckpointLock()), it will get the checkpoint lock utill all record has been collected.

May 09 '24 04:05 Jetiaime

Can you give a detailed log of the zeta engine?

May 09 '24 07:05 happyboy1024

Can you give a detailed log of the zeta engine?

It seems like not the lock competition, but the org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next() always be true. So the lock will not give always utill all ResultSet record has been collected.

May 09 '24 08:05 Jetiaime

Can you give a detailed log of the zeta engine?

It seems like not the lock competition, but the org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next() always be true. So the lock will not give always utill all ResultSet record has been collected.

Yes, I'm not sure if your source table has a primary key because you didn't set the partition column, which may result in only one split being present, and then savepoint is waiting for the currently split executing. So we need to observe some key information from the zeta engine logs.

May 09 '24 09:05 happyboy1024

Can you give a detailed log of the zeta engine?

It seems like not the lock competition, but the org.apache.seatunnel.connectors.seatunnel.jdbc.internal.JdbcInputFormat#resultSet.next() always be true. So the lock will not give always utill all ResultSet record has been collected.

Yes, I'm not sure if your source table has a primary key because you didn't set the partition column, which may result in only one split being present, and then savepoint is waiting for the currently split executing. So we need to observe some key information from the zeta engine logs.

Yes, it just a one giant split, because I didn't set any special conf, and table just two field: ID and NAME, with no any keys.

May 09 '24 09:05 Jetiaime

Should it be setten a limitation in case a giant split holding the checkpoint lock for a long time ? @hailin0 @Hisoka-X

May 09 '24 09:05 Jetiaime

The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.

May 09 '24 09:05 Hisoka-X

The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.

So there is no way to make a savepoint or checkpoint when a Table without any Keys but with a huge records?

May 09 '24 10:05 Jetiaime

The minimum granularity of savepoint is split. If you perform a savepoint in the middle of reading a split, the status file obtained may also be wrong.

So there is no way to make a savepoint or checkpoint when a Table without any Keys but with a huge records?

Just cancel it. Restore can not do anything even you got the right state file.

May 09 '24 10:05 Hisoka-X

seatunnel seatunnel copied to clipboard

[Bug] [Zeta] savePointJob Doesn't work

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Zeta or Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct

seatunnel
seatunnel copied to clipboard