flink-on-k8s-operator
flink-on-k8s-operator copied to clipboard
Failed to submit JobGraph and the exception detail was not enough to detect the reason
With latest master build create example session cluster and job cluster using flink:1.12.1-scala_2.12-java11
In test docker env.
/opt/flink/bin/flink run -m flinksessioncluster-sample-jobmanager:8081 /opt/flink/examples/myfault-1.0-SNAPSHOT.jar
2021-02-04 02:31:03,798 INFO org.apache.flink.client.cli.CliFrontend [] - --------------------------------------------------------------------------------
2021-02-04 02:31:03,801 INFO org.apache.flink.client.cli.CliFrontend [] - Starting Command Line Client (Version: 1.12.1, Scala: 2.12, Rev:dc404e2, Date:2021-01-09T14:46:36+01:00)
2021-02-04 02:31:03,802 INFO org.apache.flink.client.cli.CliFrontend [] - OS current user: root
2021-02-04 02:31:03,802 INFO org.apache.flink.client.cli.CliFrontend [] - Current Hadoop/Kerberos user: <no hadoop dependency found>
2021-02-04 02:31:03,802 INFO org.apache.flink.client.cli.CliFrontend [] - JVM: OpenJDK 64-Bit Server VM - Oracle Corporation - 11/11.0.10+9
2021-02-04 02:31:03,802 INFO org.apache.flink.client.cli.CliFrontend [] - Maximum heap size: 709 MiBytes
2021-02-04 02:31:03,802 INFO org.apache.flink.client.cli.CliFrontend [] - JAVA_HOME: /usr/local/openjdk-11
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - No Hadoop Dependency available
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - JVM Options:
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - -Dlog.file=/opt/flink/log/flink--client-myfault-run-cn9xv.log
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - -Dlog4j.configuration=file:/opt/flink/conf/log4j-cli.properties
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - -Dlog4j.configurationFile=file:/opt/flink/conf/log4j-cli.properties
2021-02-04 02:31:03,803 INFO org.apache.flink.client.cli.CliFrontend [] - -Dlogback.configurationFile=file:/opt/flink/conf/logback.xml
2021-02-04 02:31:03,804 INFO org.apache.flink.client.cli.CliFrontend [] - Program Arguments:
2021-02-04 02:31:03,806 INFO org.apache.flink.client.cli.CliFrontend [] - run
2021-02-04 02:31:03,806 INFO org.apache.flink.client.cli.CliFrontend [] - -m
2021-02-04 02:31:03,806 INFO org.apache.flink.client.cli.CliFrontend [] - flinksessioncluster-sample-jobmanager:8081
2021-02-04 02:31:03,806 INFO org.apache.flink.client.cli.CliFrontend [] - /opt/flink/examples/myfault-1.0-SNAPSHOT.jar
2021-02-04 02:31:03,806 INFO org.apache.flink.client.cli.CliFrontend [] - Classpath: /opt/flink/lib/flink-csv-1.12.1.jar:/opt/flink/lib/flink-json-1.12.1.jar:/opt/flink/lib/flink-shaded-zookeeper-3.4.14.jar:/opt/flink/lib/flink-table-blink_2.12-1.12.1.jar:/opt/flink/lib/flink-table_2.12-1.12.1.jar:/opt/flink/lib/log4j-1.2-api-2.12.1.jar:/opt/flink/lib/log4j-api-2.12.1.jar:/opt/flink/lib/log4j-core-2.12.1.jar:/opt/flink/lib/log4j-slf4j-impl-2.12.1.jar:/opt/flink/lib/flink-dist_2.12-1.12.1.jar:::
2021-02-04 02:31:03,807 INFO org.apache.flink.client.cli.CliFrontend [] - --------------------------------------------------------------------------------
2021-02-04 02:31:03,811 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.address, myfault-run-cn9xv
2021-02-04 02:31:03,812 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.rpc.port, 6123
2021-02-04 02:31:03,812 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.memory.process.size, 1600m
2021-02-04 02:31:03,812 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.memory.process.size, 1728m
2021-02-04 02:31:03,812 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: taskmanager.numberOfTaskSlots, 1
2021-02-04 02:31:03,813 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: parallelism.default, 1
2021-02-04 02:31:03,813 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: jobmanager.execution.failover-strategy, region
2021-02-04 02:31:03,814 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: blob.server.port, 6124
2021-02-04 02:31:03,814 INFO org.apache.flink.configuration.GlobalConfiguration [] - Loading configuration property: query.server.port, 6125
2021-02-04 02:31:03,848 INFO org.apache.flink.client.cli.CliFrontend [] - Loading FallbackYarnSessionCli
2021-02-04 02:31:03,945 INFO org.apache.flink.core.fs.FileSystem [] - Hadoop is not in the classpath/dependencies. The extended set of supported File Systems via Hadoop is not available.
2021-02-04 02:31:04,068 INFO org.apache.flink.runtime.security.modules.HadoopModuleFactory [] - Cannot create Hadoop Security Module because Hadoop cannot be found in the Classpath.
2021-02-04 02:31:04,082 INFO org.apache.flink.runtime.security.modules.JaasModule [] - Jaas file will be created as /tmp/jaas-5146463234971937258.conf.
2021-02-04 02:31:04,093 INFO org.apache.flink.runtime.security.contexts.HadoopSecurityContextFactory [] - Cannot install HadoopSecurityContext because Hadoop cannot be found in the Classpath.
2021-02-04 02:31:04,095 INFO org.apache.flink.client.cli.CliFrontend [] - Running 'run' command.
2021-02-04 02:31:04,230 INFO org.apache.flink.client.cli.CliFrontend [] - Building program from JAR file
2021-02-04 02:31:04,325 INFO org.apache.flink.client.ClientUtils [] - Starting program (detached: false)
2021-02-04 02:31:16,070 WARN org.apache.flink.util.ExecutorUtils [] - ExecutorService did not terminate in time. Shutting it down now.
2021-02-04 02:31:16,074 ERROR org.apache.flink.client.cli.CliFrontend [] - Error while running the command.
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Failed to execute job 'Fraud Detection'.
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:360) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:213) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:114) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:816) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:248) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1058) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1136) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28) [flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1136) [flink-dist_2.12-1.12.1.jar:1.12.1]
Caused by: org.apache.flink.util.FlinkException: Failed to execute job 'Fraud Detection'.
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1918) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.program.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:135) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.client.program.StreamContextEnvironment.execute(StreamContextEnvironment.java:76) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1782) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.allstoalls.FraudDetectionJob.main(FraudDetectionJob.java:48) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) ~[?:?]
at java.lang.reflect.Method.invoke(Unknown Source) ~[?:?]
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:343) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
... 8 more
Caused by: org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.
at org.apache.flink.client.program.rest.RestClusterClient.lambda$submitJob$7(RestClusterClient.java:400) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at java.util.concurrent.CompletableFuture.uniExceptionally(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$UniExceptionally.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.completeExceptionally(Unknown Source) ~[?:?]
at org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$9(FutureUtils.java:364) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at java.util.concurrent.CompletableFuture.uniWhenComplete(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.postComplete(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture.postFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
Caused by: org.apache.flink.runtime.rest.util.RestClientException: [Internal server error: Java heap space]
at org.apache.flink.runtime.rest.RestClient.parseResponse(RestClient.java:486) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at org.apache.flink.runtime.rest.RestClient.lambda$submitRequest$3(RestClient.java:466) ~[flink-dist_2.12-1.12.1.jar:1.12.1]
at java.util.concurrent.CompletableFuture$UniCompose.tryFire(Unknown Source) ~[?:?]
at java.util.concurrent.CompletableFuture$Completion.run(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:?]
at java.lang.Thread.run(Unknown Source) ~[?:?]
And in the same docker
/opt/flink/bin/flink run -m flinksessioncluster-sample-jobmanager:8081 /opt/flink/examples/batch/WordCount.jar --input /opt/flink/README.txt
works fine.
So what's the real reason on [Internal server error: Java heap space] The jar can work fine in local flink cluster.
Do we have some methods to debug it?
figure out :
default heap size : jobmanager.memory.heap.size 25165824b is too small.
using this config:
flinkProperties:
taskmanager.numberOfTaskSlots: "1"
jobmanager.heap.size: "" # set empty value (only for Flink version 1.11 or above)
jobmanager.memory.heap.size: 150mb
jobmanager.memory.process.size: 1gb # job manager memory limit (only for Flink version 1.11 or above)
taskmanager.heap.size: "" # set empty value
taskmanager.memory.process.size: 1gb # task manager memory limit
The job can submit now. The error message it is not give the special issue.