seatunnel Seatunnel with Spark: Caused by: java.lang.OutOfMemoryError: Java heap space

Search before asking

[X] I had searched in the issues and found no similar issues.

What happened

java.sql.SQLException: Java heap space

SeaTunnel Version

apache-seatunnel-incubating-2.1.3

SeaTunnel Config

env {
  execution.parallelism = 4
  job.mode = "BATCH"

  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 100
  spark.executor.memory = "8g"

}

source {
    jdbc {
        
    }
}

transform {

}

sink {
   Console {}
}

Running Command

./bin/start-seatunnel-spark.sh  --master local[4] --deploy-mode client  --config ./config/mysql-to-pg.conf

Error Exception

java.sql.SQLException: Java heap space
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space

Flink or Spark Version

spark-2.4.8-bin-hadoop2.7

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Apr 18 '23 01:04 cloudhuang

This bug maybe fixed in #4502.

Apr 18 '23 02:04 CheneyYin

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

Apr 18 '23 02:04 cloudhuang

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup

Apr 18 '23 02:04 CheneyYin

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup

Due to the poor connection(GFW), it take a lone time to download all the dependices from maven repo, do you have a release schedular for this fix? then I'll try it again later, thanks.

Apr 18 '23 04:04 cloudhuang

You can ceate a codespace (4 core, 8GB memory) from dev branch. After setup, you can download the tarball file. It will help you.

Apr 18 '23 04:04 CheneyYin

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

May 19 '23 00:05 github-actions[bot]

I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.

-m local

2023-07-11 15:47:14,211 WARN  org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2
java.lang.OutOfMemoryError: Java heap space
...
2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space

My config is:

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {

  JDBC {
    url = "jdbc:mysql:<>"
    driver = "com.mysql.cj.jdbc.Driver"
    user = <>
    password = <>
    query = "SELECT from <>"
    parallelism = 2
    fetch_size = 500
    }
}
sink {
  localFile {
    path="test_log"
    file_format_type="parquet"
    }
}

The expected parquet folder is not being created.

Jul 11 '23 20:07 mgierdal

I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.

2023-07-11 15:47:14,211 WARN  org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2
java.lang.OutOfMemoryError: Java heap space
...
2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space

My config is:

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {

  JDBC {
    url = "jdbc:mysql:<>"
    driver = "com.mysql.cj.jdbc.Driver"
    user = <>
    password = <>
    query = "SELECT from <>"
    parallelism = 2
    fetch_size = 500
    }
}
sink {
  localFile {
    path="test_log"
    file_format_type="parquet"
    }
}

The expected parquet folder is not being created.

Do you have the same problem when using the spark engine?

Jul 12 '23 00:07 CheneyYin

I can use only -m local, having no access to Spark or Fink.

Jul 25 '23 13:07 mgierdal

seatunnel seatunnel copied to clipboard

Seatunnel with Spark: Caused by: java.lang.OutOfMemoryError: Java heap space

Search before asking

What happened

SeaTunnel Version

SeaTunnel Config

Running Command

Error Exception

Flink or Spark Version

Java or Scala Version

Screenshots

Are you willing to submit PR?

Code of Conduct

seatunnel
seatunnel copied to clipboard