seatunnel
seatunnel copied to clipboard
Seatunnel with Spark: Caused by: java.lang.OutOfMemoryError: Java heap space
Search before asking
- [X] I had searched in the issues and found no similar issues.
What happened
java.sql.SQLException: Java heap space
SeaTunnel Version
apache-seatunnel-incubating-2.1.3
SeaTunnel Config
env {
execution.parallelism = 4
job.mode = "BATCH"
spark.app.name = "SeaTunnel"
spark.executor.instances = 2
spark.executor.cores = 100
spark.executor.memory = "8g"
}
source {
jdbc {
}
}
transform {
}
sink {
Console {}
}
Running Command
./bin/start-seatunnel-spark.sh --master local[4] --deploy-mode client --config ./config/mysql-to-pg.conf
Error Exception
java.sql.SQLException: Java heap space
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:916)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeQuery(ClientPreparedStatement.java:972)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:304)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:411)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:417)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.OutOfMemoryError: Java heap space
Flink or Spark Version
spark-2.4.8-bin-hadoop2.7
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
This bug maybe fixed in #4502.
This bug maybe fixed in #4502.
Cool, thanks, so which version should I use?
This bug maybe fixed in #4502.
Cool, thanks, so which version should I use?
It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup
This bug maybe fixed in #4502.
Cool, thanks, so which version should I use?
It will released in 2.3.2, and has not been released util now. You can pull dev branch from github, and then setup. Setup
Due to the poor connection(GFW), it take a lone time to download all the dependices from maven repo, do you have a release schedular for this fix? then I'll try it again later, thanks.
You can ceate a codespace (4 core, 8GB memory) from dev branch. After setup, you can download the tarball file. It will help you.
This issue has been automatically marked as stale because it has not had recent activity for 30 days. It will be closed in next 7 days if no further activity occurs.
I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.
-m local
2023-07-11 15:47:14,211 WARN org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2
java.lang.OutOfMemoryError: Java heap space
...
2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space
My config is:
env {
# You can set SeaTunnel environment configuration here
execution.parallelism = 2
job.mode = "BATCH"
checkpoint.interval = 10000
#execution.checkpoint.interval = 10000
#execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint"
}
source {
JDBC {
url = "jdbc:mysql:<>"
driver = "com.mysql.cj.jdbc.Driver"
user = <>
password = <>
query = "SELECT from <>"
parallelism = 2
fetch_size = 500
}
}
sink {
localFile {
path="test_log"
file_format_type="parquet"
}
}
The expected parquet folder is not being created.
I am having a similar failure with ver.2.3.2, under java 17.0.7 2023-04-18 LTS.
2023-07-11 15:47:14,211 WARN org.apache.seatunnel.engine.server.TaskExecutionService - [localhost]:5801 [seatunnel-495998] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@67644eb2 java.lang.OutOfMemoryError: Java heap space ... 2023-07-11 15:47:12,347 ERROR org.apache.seatunnel.engine.server.dag.physical.PhysicalVertex - Job SeaTunnel_Job (731233925082382337), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-JDBC-default-identifier]-SourceTask (1/2)] end with state FAILED and Exception: java.lang.OutOfMemoryError: Java heap space
My config is:
env { # You can set SeaTunnel environment configuration here execution.parallelism = 2 job.mode = "BATCH" checkpoint.interval = 10000 #execution.checkpoint.interval = 10000 #execution.checkpoint.data-uri = "hdfs://localhost:9000/checkpoint" } source { JDBC { url = "jdbc:mysql:<>" driver = "com.mysql.cj.jdbc.Driver" user = <> password = <> query = "SELECT from <>" parallelism = 2 fetch_size = 500 } } sink { localFile { path="test_log" file_format_type="parquet" } }
The expected parquet folder is not being created.
Do you have the same problem when using the spark engine?
I can use only -m local
, having no access to Spark or Fink.