bazel-buildfarm icon indicating copy to clipboard operation
bazel-buildfarm copied to clipboard

Worker gets stuck in unusable state on startup

Open 80degreeswest opened this issue 4 years ago • 1 comments

This has been observed multiple times, where a GPU worker is trying to start up and gets stuck in unusable space. Below are the only logs from that worker's container. Picked up _JAVA_OPTIONS: -Djava.util.logging.config.file=/var/lib/buildfarm-shard-worker/logging.properties -Djava.util.logging.config.level=INFO -Xmx96g SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. [INFO ] build.buildfarm.worker.shard.Worker <init> - buildfarm-worker-10.35.219.237:8981-523405f1-ebd7-493f-b170-666eccdddd91 initialized [INFO ] build.buildfarm.cas.cfc.CASFileCache start - Initializing cache at: /var/buildfarm/worker/cache [INFO ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Scanning Cache Root... [INFO ] build.buildfarm.cas.cfc.CASFileCache logCacheScanResults - {"keys":78206,"dirs":26768,"delete":14144} [INFO ] build.buildfarm.cas.cfc.CASFileCache joinThreads - Populating Directories...

80degreeswest avatar May 27 '21 16:05 80degreeswest

jstack for the stuck worker? Also, what's your buckets set to?

werkt avatar May 27 '21 16:05 werkt