gazelle_plugin
gazelle_plugin copied to clipboard
Fatal Error with TPC-DS benchmark, n2-highmem-32 or n2-standard-32
Describe the bug Fatal Error occurs In many of TPC-DS benchmark run on Dataproc with gazelle enabled. The cluster has been created using configurations/instructions in readme file
/bin/bash: line 1: 28554 Aborted (core dumped) LD_LIBRARY_PATH="/opt/benchmark-tools/oap/lib:/opt/benchmark-tools/oap/lib" /usr/lib/jvm/adoptopenjdk-8-hotspot-amd64/bin/java -server -Xmx8192m '-XX:+UseParallelOldGC' '-XX:ParallelGCThreads=5' '-XX:NewRatio=1' '-XX:SurvivorRatio=1' '-XX:+UseCompressedOops' '-verbose:gc' '-XX:+PrintGCDetails' '-XX:+PrintGCTimeStamps' -Djava.io.tmpdir=/mnt/2/hadoop/yarn/nm-local-dir/usercache/eman_copty_intel_com/appcache/application_1642622251224_0002/container_1642622251224_0002_01_000007/tmp '-Dspark.driver.port=46363' '-Dspark.network.timeout=3600s' '-Dspark.authenticate=false' '-Dspark.ui.port=0' '-Dspark.rpc.message.maxSize=512' -Dspark.yarn.app.container.log.dir=/var/log/hadoop-yarn/userlogs/application_1642622251224_0002/container_1642622251224_0002_01_000007 -XX:OnOutOfMemoryError='kill %p' org.apache.spark.executor.YarnCoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@aop-gazelle-m.c.articulate-rain-321323.internal:46363 --executor-id 7 --hostname aop-gazelle-w-0.c.articulate-rain-321323.internal --cores 8 --app-id application_1642622251224_0002 --resourceProfileId 0 --user-class-path file:/mnt/2/hadoop/yarn/nm-local-dir/usercache/eman_copty_intel_com/appcache/application_1642622251224_0002/container_1642622251224_0002_01_000007/__app__.jar --user-class-path file:/mnt/2/hadoop/yarn/nm-local-dir/usercache/eman_copty_intel_com/appcache/application_1642622251224_0002/container_1642622251224_0002_01_000007/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar > /var/log/hadoop-yarn/userlogs/application_1642622251224_0002/container_1642622251224_0002_01_000007/stdout 2> /var/log/hadoop-yarn/userlogs/application_1642622251224_0002/container_1642622251224_0002_01_000007/stderr
The test ends with fatal error:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f213935aca7, pid=25500, tid=0x00007f213a6d7700
#
# JRE version: OpenJDK Runtime Environment (8.0_292-b10) (build 1.8.0_292-b10)
# Java VM: OpenJDK 64-Bit Server VM (25.292-b10 mixed mode linux-amd64 )
# Problematic frame:
# V [libjvm.so+0x90fca7] Monitor::ILock(Thread*) [clone .part.2]+0x17
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /tmp/spark_columnar_plugin_7516760349859573041/hs_err_pid25500.log
#
# If you would like to submit a bug report, please visit:
# https://github.com/AdoptOpenJDK/openjdk-support/issues
#
To Reproduce with oap 1.3.0.dataproc20 Create a cluster in Dataproc using instructions in this [link] (https://github.com/oap-project/oap-tools/blob/master/integrations/oap/dataproc/benchmark/Gazelle_on_Dataproc.md ) , 4 local SSDs per worker, ubuntu-18 - n2-highmem-32 or n2-standard-32
Expected behavior No fatal errors to appear during running test