Fails to push metrics to Stackdriver with short step interval and large batch size
Hi, I'm using micrometer version 1.1.3 and spring boot 2.1.0.RELEASE. StackdriverMeterRegistry is defined as a spring bean :
@Bean
public static StackdriverMeterRegistry stackdriver() {
return StackdriverMeterRegistry.builder(new StackdriverConfig() {
@Override
public String projectId() {
return "test-project";
}
@Override
public String get(String key) {
return null;
}
@Override
public Duration step() {
return Duration.ofSeconds(5);
}
}).build();
}
However, when I ran the app on GCP I keep seeing the following warning without any stacktrace :
failed to send metrics to stackdriver: INTERNAL: http2 exception
It's strange because I can push metrics from the same application and environment using just google-cloud-monitoring library: https://cloud.google.com/monitoring/docs/reference/libraries#client-libraries-usage-java
Could you please let me know if I'm missing something in configuration etc.
Thanks, Gleb
@mglebka Thanks for the report! I didn't look into this but no stack trace is not aligned with other meter registries, so I created #1269 to resolve it.
I reproduced the error with a sample and the stack trace was as follows:
2019-03-09 05:23:56.139 WARN 82634 --- [trics-publisher] i.m.s.StackdriverMeterRegistry : failed to send metrics to stackdriver
com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67) ~[gax-1.34.0.jar:1.34.0]
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) ~[gax-grpc-1.34.0.jar:1.34.0]
at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) ~[gax-grpc-1.34.0.jar:1.34.0]
at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97) ~[gax-grpc-1.34.0.jar:1.34.0]
at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68) ~[api-common-1.7.0.jar:na]
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123) ~[guava-20.0.jar:na]
at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) ~[guava-20.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900) ~[guava-20.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811) ~[guava-20.0.jar:na]
at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675) ~[guava-20.0.jar:na]
at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:507) ~[grpc-stub-1.15.0.jar:1.15.0]
at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482) ~[grpc-stub-1.15.0.jar:1.15.0]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.15.0.jar:1.15.0]
at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[grpc-core-1.15.0.jar:1.15.0]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) ~[gax-1.34.0.jar:1.34.0]
at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) ~[gax-1.34.0.jar:1.34.0]
at com.google.cloud.monitoring.v3.MetricServiceClient.createTimeSeries(MetricServiceClient.java:1156) ~[google-cloud-monitoring-1.50.0.jar:1.50.0]
at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:163) ~[micrometer-registry-stackdriver-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_101]
... 3 common frames omitted
Caused by: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
at io.grpc.Status.asRuntimeException(Status.java:526) ~[grpc-core-1.15.0.jar:1.15.0]
... 23 common frames omitted
Caused by: io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$HeaderListSizeException: Header size exceeded max allowed size (8192)
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:171) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:228) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:541) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:128) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2HeadersDecoder.decodeHeaders(DefaultHttp2HeadersDecoder.java:127) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$HeadersBlockBuilder.headers(DefaultHttp2FrameReader.java:745) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:483) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:491) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:254) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1407) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
... 1 common frames omitted
There were 66 metrics and no problem with the default step size, 1 mintue. I changed it to 5 seconds to reproduce it. If I changed batch size to 10, it works with 5 seconds step size although there are some unrelated failures when sending metrics.
I didn't look into how it contributes to HTTP header size yet.
@mglebka Could you try it with a smaller batch size?
@izeye , thanks for reproducing it, the nearest I can try is Monday, I will let you know if it works.
@izeye , I tried default step size of 1 minute and exception does not occur anymore, however I keep seeing few other exceptions :
failed to send metrics to stackdriver: INVALID_ARGUMENT: Field timeSeries[44].points[0].distributionValue had an invalid value: Distribution bucket_counts(43) has a negative count.
and
i.m.s.StackdriverMeterRegistry : failed to send metrics to stackdriver: INTERNAL: An internal error occurred.
Im also facing this issue, one of my buckets is negative. I did a little debugging like so:
if (latencyNanos < 0) {
print(s"got latency < 0 ${latencyNanos} ${name}" )
}
Timer
.builder("akka.streams.upstreamlatency")
.tags("step", name)
.sla(Micrometer.timerPercentiles: _*)
.description("Difference between a push and the last pull, gives a reading of how much time the upstream takes to produce an element after it's been requested")
.register(micrometer)
.record(latencyNanos, TimeUnit.NANOSECONDS)
And im definately not sending negative values in, somehow in the conversions the values are becoming negative
@LukeDefeo @mglebka I was facing the same problem, and created a separate issue for this as it seems to be unrelated to the http2 exception from your original post. As a local workaround I copied the StackdriverMeterRegistry class and changed a line, which could be useful for you. here is the opened issue if you're interested: #1325
I think we need to understand what affects the header size so we can prevent making a batch of metrics to send that exceeds the allowed header size. It is strange to me though that the step interval seems to affect this. I would think the batch size alone would affect the header size.
Is there a known workaround for this? Either by changing the batch size or the step?
What was the filter name you have used for finding micrometer latency(@timed) metric in below screen.
Is there a known workaround for this? Either by changing the batch size or the step?
@pedro-carneiro As mentioned in previous comments, it seems increasing the step size (say to the default of 1 minute) and/or reducing the batch size to something smaller works around the error. We'll still need to get to the bottom of why this is happening at all.
I had tried different combinations, yet I was only able to reduce the number of errors, not avoid them entirely...
I'm able to set the max header size with the below (Quarkus/Kotlin). maxInboundMetadataSize roughly controls max header size.
/*
* Micrometer customisation
* Look at [io.quarkus.micrometer.runtime.MicrometerRecorder], where beanManager is used.
* Can be customised with MeterFilter, MeterRegistry, MeterBinder, MeterFilterConstraint
*/
@Produces
@Singleton
@IfBuildProfile("prod")
fun produceMeterRegistry(): MeterRegistry {
val config = object : StackdriverConfig {
override fun projectId(): String {
return gcpProjectID
}
override fun get(key: String): String? = null
}
return StackdriverMeterRegistry
.builder(config)
.metricServiceSettings {
val builder = MetricServiceSettings.newBuilder()
if (config.credentials() != null) {
builder.credentialsProvider = config.credentials()
}
builder.transportChannelProvider =
MetricServiceSettings.defaultGrpcTransportProviderBuilder().setChannelConfigurator { builder ->
when (builder) {
is AbstractManagedChannelImplBuilder<*> -> builder.maxInboundMetadataSize(1024 * 1024)
else -> builder
}
}.build()
builder.headerProvider = HeaderProvider {
val version = StackdriverMeterRegistry::class.java.`package`.implementationVersion
mapOf(
"User-Agent" to "Micrometer/$version micrometer-registry-stackdriver/$version"
)
}
builder.build()
}
.build()
}
Hi, I am also encountering this problem. The HTTP/2 problem is due to monitoring.googleapis.com sending back an overly large response. This is in violation of the HTTP/2 spec, which is why the underlying reason is hard to find. I don't know why they are ignoring the max header size, but that's a separate thing.
The header block includes a large error message, very close to the 8KB limit, which I have included. The error message is included in the grpc-message status field, but is also replicated into the grpc-status-details-bin, effectively doubling the size. This makes a 4KB message turn into an 8K, pushing it over the limit. It seems to indicate that the sampling rate is too high. I am not as familiar with tuning this, but it seems like less frequent, larger batches may help.
One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[4]: custom.googleapis.com/process/uptime{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[6]: custom.googleapis.com/system/load/average/1m{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[24,25,74]: custom.googleapis.com/logback/events{levelwarn}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[5,20,28,31,33,73]: custom.googleapis.com/jvm/memory/used{areanonheap,idCompressed Class Space}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[0,53,70,97]: custom.googleapis.com/jvm/threads/states{stateblocked}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[52]: custom.googleapis.com/jvm/threads/daemon{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[21]: custom.googleapis.com/jvm/buffer/total/capacity{iddirect}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[86]: custom.googleapis.com/jvm/classes/unloaded{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[90]: custom.googleapis.com/jvm/threads/peak{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[51]: custom.googleapis.com/process/start/time{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[26,30,41,80,92]: custom.googleapis.com/jvm/memory/max{idCompressed Class Space,areanonheap}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[27,75]: custom.googleapis.com/jvm/buffer/count{idmapped}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[62]: custom.googleapis.com/jvm/gc/live/data/size{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[43]: custom.googleapis.com/system/cpu/count{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[54,55,57,76,79,87,93]: custom.googleapis.com/jvm/memory/committed{idCodeHeap 'non-nmethods',areanonheap}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[63]: custom.googleapis.com/jvm/buffer/memory/used{iddirect}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[84]: custom.googleapis.com/jvm/threads/live{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[89]: custom.googleapis.com/jvm/classes/loaded{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[61]: custom.googleapis.com/process/files/open{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[95]: custom.googleapis.com/jvm/gc/memory/promoted{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[22]: custom.googleapis.com/system/cpu/usage{}
Another error message that pushes the limit:
One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[84]: custom.googleapis.com/jvm/threads/live{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[5,20,28,31,33,73]: custom.googleapis.com/jvm/memory/used{idCompressed Class Space,areanonheap}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[86]: custom.googleapis.com/jvm/classes/unloaded{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[4]: custom.googleapis.com/process/uptime{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[21]: custom.googleapis.com/jvm/buffer/total/capacity{iddirect}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[24,25,74]: custom.googleapis.com/logback/events{levelwarn}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[27,75]: custom.googleapis.com/jvm/buffer/count{idmapped}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[62]: custom.googleapis.com/jvm/gc/live/data/size{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[51]: custom.googleapis.com/process/start/time{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[6]: custom.googleapis.com/system/load/average/1m{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[54,55,57,76,79,87,93]: custom.googleapis.com/jvm/memory/committed{areanonheap,idCodeHeap 'non-nmethods'}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[63]: custom.googleapis.com/jvm/buffer/memory/used{iddirect}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[90]: custom.googleapis.com/jvm/threads/peak{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[22]: custom.googleapis.com/system/cpu/usage{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[95]: custom.googleapis.com/jvm/gc/memory/promoted{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[43]: custom.googleapis.com/system/cpu/count{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[0,53,70,97]: custom.googleapis.com/jvm/threads/states{stateblocked}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[26,30,41,80,92]: custom.googleapis.com/jvm/memory/max{idCompressed Class Space,areanonheap}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[52]: custom.googleapis.com/jvm/threads/daemon{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[61]: custom.googleapis.com/process/files/open{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[89]: custom.googleapis.com/jvm/classes/loaded{}
I found the solution for me. I have to add custom tags to the metrics to ensure that these are unique across services/instances:
management:
metrics:
tags:
app: my-unique-service-label
instance: ${random.uuid}
See also:
- https://stackoverflow.com/questions/61500324/unable-to-publish-spring-boot-metrics-to-gcp-stackdriver/73386941#73386941
- https://github.com/GoogleCloudPlatform/spring-cloud-gcp/issues/1223
@carl-mastrangelo how you get those messages from header?
@smolarek999 You can configure the limit using this method: https://grpc.github.io/grpc-java/javadoc/io/grpc/ManagedChannelBuilder.html#maxInboundMetadataSize-int-
See this comment for an example: https://github.com/micrometer-metrics/micrometer/issues/1268#issuecomment-1081594772
@carl-mastrangelo how you get those messages from header?
This bug is reproducible locally, so I stepped through in a debugger until I could see the values in Netty.
Is this still a problem? I'm asking since here someone has pointed out a solution
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
There are a few confounding issues, but I was able to come up with a work around.
That said, it would be nice if Micrometer would open up the Constructor to StackdriverMeterRegistry:
https://github.com/micrometer-metrics/micrometer/blob/d904d760198c5e2e14b985b8ca0d7ef76bb7957a/implementations/micrometer-registry-stackdriver/src/main/java/io/micrometer/stackdriver/StackdriverMeterRegistry.java#L97
There are multiple parts to the work around, but the one that Micrometer can help with is opening up the visibility on that constructor. We need to be able to set the http2 settings on the client, and we cannot pass it in today. My hack solution is reflection to make that ctor accessible, which isn't ideal.
(aside: Micrometer is fundamentally incompatible with GCP's Cloud Run, which is explicitly does not support custom metrics. (and even more aside, they seem to have modified that doc to suggest a side car rather than in-process reporting). They instead suggest using log-based metrics, which are not as capable. My hack solution is to report metrics as the Generic Node resource type, which does accept an instance ID, as mentioned in the suggested answer you linked to.)
If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.
Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.
Contributor
it is still a problem because e.g. OTEL auto-instrumentation keeps failing on GCP because it cannot send profiles to Cloud Profiler.