seatunnel icon indicating copy to clipboard operation
seatunnel copied to clipboard

Seatunnel with Spark: Caused by: java.lang.OutOfMemoryError: Java heap space

Open cloudhuang opened this issue 1 year ago • 8 comments

Search before asking

  • [X] I had searched in the issues and found no similar issues.

What happened

java.sql.SQLException: Java heap space

SeaTunnel Version

apache-seatunnel-incubating-2.1.3

SeaTunnel Config

env {
  execution.parallelism = 4
  job.mode = "BATCH"

  spark.app.name = "SeaTunnel"
  spark.executor.instances = 2
  spark.executor.cores = 100
  spark.executor.memory = "8g"

}

source {
    jdbc {
        
    }
}

transform {

}

sink {
   Console {}
}

Running Command

./bin/start-seatunnel-spark.sh  --master local[4] --deploy-mode client  --config ./config/mysql-to-pg.conf

Error Exception

java.sql.SQLException: Java heap space
        at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
        at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916)
        at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:123)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space

Flink or Spark Version

spark-2.4.8-bin-hadoop2.7

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • [ ] Yes I am willing to submit a PR!

Code of Conduct

cloudhuang avatar Apr 18 '23 01:04 cloudhuang

This bug maybe fixed in #4502.

CheneyYin avatar Apr 18 '23 02:04 CheneyYin

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

cloudhuang avatar Apr 18 '23 02:04 cloudhuang

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup

CheneyYin avatar Apr 18 '23 02:04 CheneyYin

This bug maybe fixed in #4502.

Cool, thanks, so which version should I use?

It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup

Due to the poor connection(GFW), it take a lone time to download all the dependices from maven repo, do you have a release schedular for this fix? then I'll try it again later, thanks.

cloudhuang avatar Apr 18 '23 04:04 cloudhuang

You can ceate a codespace (4 core, 8GB memory) from dev branch. After setup, you can download the tarball file. It will help you.

CheneyYin avatar Apr 18 '23 04:04 CheneyYin

This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.

github-actions[bot] avatar May 19 '23 00:05 github-actions[bot]

I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.

-m local

2023-07-11 15:47:14,211 WARN  org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2
java.lang.OutOfMemoryError: Java heap space
...
2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space

My config is:

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {

  JDBC {
    url = "jdbc:mysql:<>"
    driver = "com.mysql.cj.jdbc.Driver"
    user = <>
    password = <>
    query = "SELECT from <>"
    parallelism = 2
    fetch_size = 500
    }
}
sink {
  localFile {
    path="test_log"
    file_format_type="parquet"
    }
}

The expected parquet folder is not being created.

mgierdal avatar Jul 11 '23 20:07 mgierdal

I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.

2023-07-11 15:47:14,211 WARN  org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2
java.lang.OutOfMemoryError: Java heap space
...
2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space

My config is:

env {
  # You can set SeaTunnel environment configuration here
  execution.parallelism = 2
  job.mode = "BATCH"
  checkpoint.interval = 10000
  #execution.checkpoint.interval = 10000
  #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}

source {

  JDBC {
    url = "jdbc:mysql:<>"
    driver = "com.mysql.cj.jdbc.Driver"
    user = <>
    password = <>
    query = "SELECT from <>"
    parallelism = 2
    fetch_size = 500
    }
}
sink {
  localFile {
    path="test_log"
    file_format_type="parquet"
    }
}

The expected parquet folder is not being created.

Do you have the same problem when using the spark engine?

CheneyYin avatar Jul 12 '23 00:07 CheneyYin

I can use only -m local, having no access to Spark or Fink.

mgierdal avatar Jul 25 '23 13:07 mgierdal