admiral icon indicating copy to clipboard operation
admiral copied to clipboard

Management Portal not responding

Open jitinkumar2018 opened this issue 5 years ago • 11 comments

Bug Report Basic Information REQUIRED: vCenter Server version: 6.7 U1 Embedded or external PSC: No Filename of the OVA you deployed: vic-dev-v1.5.0-rc2-6834-28057308.ova. How was the OVA deployed? ovftool Does the VIC appliance recieve configuration by DHCP? No What stage of the Appliance Lifecycle is the VIC appliance in? Application VIC appliance logs: vic_appliance_logs_2019-01-08-10-16-54.tar.gz

Bug Report Detailed Information Admiral stopped responding 24 hours after deployment.

DETAILS: VIC appliance was deployed and 50 VCH's were deployed. 50 Projects were created and VCH's were added to project-p01 and we were able to see these changes on admiral for a day. Management portal: https://vic-st-h2-132.eng.vmware.com:8282

Able to access the vic startup page: https://10.197.37.132:9443/ But management portal is not responding now There was a time skew of 2 minutes between the VC and VIC appliance. Admiral still not responding even after updating the VIC appliance time.

jitinkumar2018 avatar Jan 08 '19 10:01 jitinkumar2018

The time skew was corrected, but still we cannot open 8282. @martin-borisov

lgayatri avatar Jan 08 '19 11:01 lgayatri

@renmaosheng @martin-borisov this is a blocker.

lgayatri avatar Jan 09 '19 04:01 lgayatri

        at com.vmware.xenon.services.common.LuceneDocumentIndexService.createPaginatedQuerySearcher(LuceneDocumentIndexService.java:1352)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.createOrUpdatePaginatedQuerySearcher(LuceneDocumentIndexService.java:1265)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.handleQueryTaskPatch(LuceneDocumentIndexService.java:1215)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.handleQueryRequest(LuceneDocumentIndexService.java:1089)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
]
[398108][S][2019-01-08T09:37:18.731Z][25][8282/resources/container-control-loop/control-loop-info][lambda$performMaintenance$4][Failed to retrieve container descriptions]
[398110][W][2019-01-08T09:37:18.731Z][25][8282/][processPendingServiceAvailableOperations][Service /core/local-query-tasks/31b194e97e6bb87557eef176b0f2c failed start: java.util.concurrent.CancellationException: Index writer is null]
[398111][W][2019-01-08T09:37:18.731Z][21][8282/][lambda$performServiceMaintenance$1][Service /resources/hosts-data-collections/host-info-data-collection failed maintenance: java.lang.IllegalStateException: Writer not available
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.createPaginatedQuerySearcher(LuceneDocumentIndexService.java:1352)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.createOrUpdatePaginatedQuerySearcher(LuceneDocumentIndexService.java:1265)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.handleQueryTaskPatch(LuceneDocumentIndexService.java:1215)
        at com.vmware.xenon.services.common.LuceneDocumentIndexService.handleQueryRequest(LuceneDocumentIndexService.java:1089)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

lgayatri avatar Jan 09 '19 04:01 lgayatri

Admiral was crashed of OOM:

Jan 08 09:35:55 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:35:55 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Dumping heap to /var/admiral/java_pid5.hprof ...
Jan 08 09:35:59 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Heap dump file created [896882459 bytes in 3.817 secs]
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Exception in thread "Lucene Merge Thread #6206" org.apache.lucene.index.MergePolicy$Merge
Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]:         at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Concurre
ntMergeScheduler.java:703)
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]:         at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMer
geScheduler.java:683)
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:37:25 vic-st-h2-132.eng.vmware.com docker[711286]: vic-admiral
Jan 08 09:37:25 vic-st-h2-132.eng.vmware.com docker[711294]: vic-admiral

DanielXiao avatar Jan 09 '19 05:01 DanielXiao

Is there memory leaking or admiral starts with not enough memory? "-Xmx768M -Xms768M -Xss256K -Xmn256M"

Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + '[' false = true ']'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + '[' x = x ']'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + MEMORY_OPTS='-Xmx768M -Xms768M -Xss256K -Xmn256M -XX:MaxMetaspaceSize=256m'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + CONFIG_FILES=/admiral/config/dist_configuration.properties
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + '[' -f /configs/config.properties ']'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + CONFIG_FILES=/admiral/config/dist_configuration.properties,/configs/config.properties
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + '[' x = x ']'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + XENON_PHOTON_MODEL_PROPS='-Dservice.document.version.retention.limit=50 -Dservice.docum
ent.version.retention.floor=10'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + '[' x = x ']'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + XENON_STACKTRACE=-Dxenon.ServiceErrorResponse.disableStackTraceCollection=true
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + JAVA_OPTS='-Ddcp.net.ssl.trustStore=/configs/trustedcertificates.jks -Ddcp.net.ssl.trus
tStorePassword=changeit -Dencryption.key.file=/var/admiral/8282/encryption.key -Dinit.encryption.key.file=true -Xmx768M -Xms768M -Xss256K -Xmn256M -XX:MaxMeta
spaceSize=256m'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + JAVA_OPTS='-Ddcp.net.ssl.trustStore=/configs/trustedcertificates.jks -Ddcp.net.ssl.trus
tStorePassword=changeit -Dencryption.key.file=/var/admiral/8282/encryption.key -Dinit.encryption.key.file=true -Xmx768M -Xms768M -Xss256K -Xmn256M -XX:MaxMeta
spaceSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/admiral/'
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + PID=5
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + wait 5
Jan 07 06:11:10 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: + java -Djava.util.logging.config.file=/admiral/config/logging.properties -Dconfiguration
.properties=/admiral/config/dist_configuration.properties,/configs/config.properties -Ddcp.net.ssl.trustStore=/configs/trustedcertificates.jks -Ddcp.net.ssl.t
rustStorePassword=changeit -Dencryption.key.file=/var/admiral/8282/encryption.key -Dinit.encryption.key.file=true -Xmx768M -Xms768M -Xss256K -Xmn256M -XX:MaxM
etaspaceSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/var/admiral/ -cp '/admiral/*:/admiral/lib/*:/etc/xenon/dynamic-services/*' -Dservice.docum
ent.version.retention.limit=50 -Dservice.document.version.retention.floor=10 -Dxenon.ServiceErrorResponse.disableStackTraceCollection=true com.vmware.admiral.
host.ManagementHost --bindAddress=0.0.0.0 --port=-1 --sandbox=/var/admiral/ --publicUri=https://vic-st-h2-132.eng.vmware.com:8282/ --bindAddress=0.0.0.0 --por
t=-1 --authConfig=/configs/psc-config.properties --securePort=8282 --keyFile=/configs/server.key --certificateFile=/configs/server.crt --startMockHostAdapterI
nstance=false

DanielXiao avatar Jan 09 '19 06:01 DanielXiao

@DanielXiao we saw this issue for the very first time . The admiral memory is the default which comes with OVA

lgayatri avatar Jan 09 '19 06:01 lgayatri

The return value is 0 when JVM crashes, so admiral is not restarted by systemd.

systemctl status admiral.service 
● admiral.service - Admiral is a highly scalable and very lightweight Container Management platform for deploying and managing container based applications.
   Loaded: loaded (/lib/systemd/system/admiral.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Tue 2019-01-08 09:37:25 UTC; 23h ago
     Docs: https://vmware.github.io/vic-product/index.html#getting-started
  Process: 711294 ExecStopPost=/usr/bin/docker rm vic-admiral (code=exited, status=0/SUCCESS)
  Process: 711286 ExecStop=/usr/bin/docker stop vic-admiral (code=exited, status=0/SUCCESS)
  Process: 1811 ExecStartPost=/usr/bin/bash /etc/vmware/admiral/add_default_users.sh (code=exited, status=0/SUCCESS)
  Process: 1810 ExecStart=/etc/vmware/admiral/start_admiral.sh (code=exited, status=0/SUCCESS)
  Process: 1752 ExecStartPre=/usr/bin/bash /etc/vmware/admiral/configure_admiral.sh (code=exited, status=0/SUCCESS)
  Process: 1746 ExecStartPre=/usr/bin/docker rm vic-admiral (code=exited, status=1/FAILURE)
  Process: 1736 ExecStartPre=/usr/bin/docker stop vic-admiral (code=exited, status=1/FAILURE)
 Main PID: 1810 (code=exited, status=0/SUCCESS)
      CPU: 2.979s

Jan 07 06:11:42 vic-st-h2-132.eng.vmware.com systemd[1]: Started Admiral is a highly scalable and very lightweight Container Management platform for deploying
 and managing container based applications..
Jan 08 09:35:55 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:35:55 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Dumping heap to /var/admiral/java_pid5.hprof ...
Jan 08 09:35:59 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Heap dump file created [896882459 bytes in 3.817 secs]
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Exception in thread "Lucene Merge Thread #6206" org.apache.lucene.index.MergePolicy$Merge
Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]:         at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(Concurre
ntMergeScheduler.java:703)
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]:         at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMer
geScheduler.java:683)
Jan 08 09:37:16 vic-st-h2-132.eng.vmware.com start_admiral.sh[1810]: Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
Jan 08 09:37:25 vic-st-h2-132.eng.vmware.com docker[711286]: vic-admiral
Jan 08 09:37:25 vic-st-h2-132.eng.vmware.com docker[711294]: vic-admiral

DanielXiao avatar Jan 09 '19 08:01 DanielXiao

We are bringing down VIC scale on the VC for RC3 testing.

jitinkumar2018 avatar Jan 09 '19 10:01 jitinkumar2018

@lazarin could you please give a new build for us to generate rc4 to ask Jitin to verify today? we want to declare rtm today, thanks.

renmaosheng avatar Jan 10 '19 03:01 renmaosheng

@jitinkumar2018 tag vic_v1.5.0-rc4 was published

lazarin avatar Jan 10 '19 09:01 lazarin

https://storage.googleapis.com/vic-product-ova-builds/vic-dev-v1.5.0-rc4-6889-009b4399.ova includes fix for this issue.

DanielXiao avatar Jan 10 '19 09:01 DanielXiao