micrometer icon indicating copy to clipboard operation
micrometer copied to clipboard

Fails to push metrics to Stackdriver with short step interval and large batch size

Open GlebMendrul opened this issue 6 years ago • 18 comments

Hi, I'm using micrometer version 1.1.3 and spring boot 2.1.0.RELEASE. StackdriverMeterRegistry is defined as a spring bean :

@Bean
  public static StackdriverMeterRegistry stackdriver() {
    return StackdriverMeterRegistry.builder(new StackdriverConfig() {
      @Override
      public String projectId() {
        return "test-project";
      }

      @Override
      public String get(String key) {
        return null;
      }

      @Override
      public Duration step() {
        return Duration.ofSeconds(5);
      }
    }).build();
  }

However, when I ran the app on GCP I keep seeing the following warning without any stacktrace : failed to send metrics to stackdriver: INTERNAL: http2 exception

It's strange because I can push metrics from the same application and environment using just google-cloud-monitoring library: https://cloud.google.com/monitoring/docs/reference/libraries#client-libraries-usage-java

Could you please let me know if I'm missing something in configuration etc.

Thanks, Gleb

GlebMendrul avatar Mar 08 '19 15:03 GlebMendrul

@mglebka Thanks for the report! I didn't look into this but no stack trace is not aligned with other meter registries, so I created #1269 to resolve it.

izeye avatar Mar 08 '19 17:03 izeye

I reproduced the error with a sample and the stack trace was as follows:

2019-03-09 05:23:56.139  WARN 82634 --- [trics-publisher] i.m.s.StackdriverMeterRegistry           : failed to send metrics to stackdriver

com.google.api.gax.rpc.InternalException: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
	at com.google.api.gax.rpc.ApiExceptionFactory.createException(ApiExceptionFactory.java:67) ~[gax-1.34.0.jar:1.34.0]
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:72) ~[gax-grpc-1.34.0.jar:1.34.0]
	at com.google.api.gax.grpc.GrpcApiExceptionFactory.create(GrpcApiExceptionFactory.java:60) ~[gax-grpc-1.34.0.jar:1.34.0]
	at com.google.api.gax.grpc.GrpcExceptionCallable$ExceptionTransformingFuture.onFailure(GrpcExceptionCallable.java:97) ~[gax-grpc-1.34.0.jar:1.34.0]
	at com.google.api.core.ApiFutures$1.onFailure(ApiFutures.java:68) ~[api-common-1.7.0.jar:na]
	at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123) ~[guava-20.0.jar:na]
	at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:435) ~[guava-20.0.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.executeListener(AbstractFuture.java:900) ~[guava-20.0.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.complete(AbstractFuture.java:811) ~[guava-20.0.jar:na]
	at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:675) ~[guava-20.0.jar:na]
	at io.grpc.stub.ClientCalls$GrpcFuture.setException(ClientCalls.java:507) ~[grpc-stub-1.15.0.jar:1.15.0]
	at io.grpc.stub.ClientCalls$UnaryStreamToFuture.onClose(ClientCalls.java:482) ~[grpc-stub-1.15.0.jar:1.15.0]
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.CensusStatsModule$StatsClientInterceptor$1$1.onClose(CensusStatsModule.java:678) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.CensusTracingModule$TracingClientInterceptor$1$1.onClose(CensusTracingModule.java:403) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:459) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ClientCallImpl.access$300(ClientCallImpl.java:63) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$600(ClientCallImpl.java:467) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:584) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37) ~[grpc-core-1.15.0.jar:1.15.0]
	at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123) ~[grpc-core-1.15.0.jar:1.15.0]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_101]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_101]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_101]
	Suppressed: com.google.api.gax.rpc.AsyncTaskException: Asynchronous task failed
		at com.google.api.gax.rpc.ApiExceptions.callAndTranslateApiException(ApiExceptions.java:57) ~[gax-1.34.0.jar:1.34.0]
		at com.google.api.gax.rpc.UnaryCallable.call(UnaryCallable.java:112) ~[gax-1.34.0.jar:1.34.0]
		at com.google.cloud.monitoring.v3.MetricServiceClient.createTimeSeries(MetricServiceClient.java:1156) ~[google-cloud-monitoring-1.50.0.jar:1.50.0]
		at io.micrometer.stackdriver.StackdriverMeterRegistry.publish(StackdriverMeterRegistry.java:163) ~[micrometer-registry-stackdriver-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT]
		at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
		at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_101]
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_101]
		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_101]
		... 3 common frames omitted
Caused by: io.grpc.StatusRuntimeException: INTERNAL: http2 exception
	at io.grpc.Status.asRuntimeException(Status.java:526) ~[grpc-core-1.15.0.jar:1.15.0]
	... 23 common frames omitted
Caused by: io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception$HeaderListSizeException: Header size exceeded max allowed size (8192)
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2Exception.headerListSizeError(Http2Exception.java:171) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2CodecUtil.headerListSizeExceeded(Http2CodecUtil.java:228) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder$Http2HeadersSink.finish(HpackDecoder.java:541) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.HpackDecoder.decode(HpackDecoder.java:128) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2HeadersDecoder.decodeHeaders(DefaultHttp2HeadersDecoder.java:127) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$HeadersBlockBuilder.headers(DefaultHttp2FrameReader.java:745) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader$2.processFragment(DefaultHttp2FrameReader.java:483) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readHeadersFrame(DefaultHttp2FrameReader.java:491) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.processPayloadState(DefaultHttp2FrameReader.java:254) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2FrameReader.readFrame(DefaultHttp2FrameReader.java:160) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2InboundFrameLogger.readFrame(Http2InboundFrameLogger.java:41) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.DefaultHttp2ConnectionDecoder.decodeFrame(DefaultHttp2ConnectionDecoder.java:118) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler$FrameDecoder.decode(Http2ConnectionHandler.java:390) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.http2.Http2ConnectionHandler.decode(Http2ConnectionHandler.java:450) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1407) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:646) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:581) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:498) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	at io.grpc.netty.shaded.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[grpc-netty-shaded-1.15.0.jar:1.15.0]
	... 1 common frames omitted

There were 66 metrics and no problem with the default step size, 1 mintue. I changed it to 5 seconds to reproduce it. If I changed batch size to 10, it works with 5 seconds step size although there are some unrelated failures when sending metrics.

I didn't look into how it contributes to HTTP header size yet.

izeye avatar Mar 08 '19 20:03 izeye

@mglebka Could you try it with a smaller batch size?

izeye avatar Mar 08 '19 20:03 izeye

@izeye , thanks for reproducing it, the nearest I can try is Monday, I will let you know if it works.

GlebMendrul avatar Mar 09 '19 14:03 GlebMendrul

@izeye , I tried default step size of 1 minute and exception does not occur anymore, however I keep seeing few other exceptions : failed to send metrics to stackdriver: INVALID_ARGUMENT: Field timeSeries[44].points[0].distributionValue had an invalid value: Distribution bucket_counts(43) has a negative count. and i.m.s.StackdriverMeterRegistry : failed to send metrics to stackdriver: INTERNAL: An internal error occurred.

GlebMendrul avatar Mar 11 '19 16:03 GlebMendrul

Im also facing this issue, one of my buckets is negative. I did a little debugging like so:


 if (latencyNanos < 0) {
        print(s"got latency < 0 ${latencyNanos} ${name}" )
      }

      Timer
        .builder("akka.streams.upstreamlatency")
        .tags("step", name)
        .sla(Micrometer.timerPercentiles: _*)
        .description("Difference between a push and the last pull, gives a reading of how much time the upstream takes to produce an element after it's been requested")
        .register(micrometer)
        .record(latencyNanos, TimeUnit.NANOSECONDS)

And im definately not sending negative values in, somehow in the conversions the values are becoming negative

LukeDefeo avatar Mar 18 '19 14:03 LukeDefeo

@LukeDefeo @mglebka I was facing the same problem, and created a separate issue for this as it seems to be unrelated to the http2 exception from your original post. As a local workaround I copied the StackdriverMeterRegistry class and changed a line, which could be useful for you. here is the opened issue if you're interested: #1325

robertalpha avatar Mar 26 '19 13:03 robertalpha

I think we need to understand what affects the header size so we can prevent making a batch of metrics to send that exceeds the allowed header size. It is strange to me though that the step interval seems to affect this. I would think the batch size alone would affect the header size.

shakuzen avatar Apr 07 '19 05:04 shakuzen

Is there a known workaround for this? Either by changing the batch size or the step?

pedro-carneiro avatar Jul 04 '19 13:07 pedro-carneiro

What was the filter name you have used for finding micrometer latency(@timed) metric in below screen. Screenshot (86)

shubh0210 avatar Nov 07 '19 06:11 shubh0210

Is there a known workaround for this? Either by changing the batch size or the step?

@pedro-carneiro As mentioned in previous comments, it seems increasing the step size (say to the default of 1 minute) and/or reducing the batch size to something smaller works around the error. We'll still need to get to the bottom of why this is happening at all.

shakuzen avatar Apr 13 '20 15:04 shakuzen

I had tried different combinations, yet I was only able to reduce the number of errors, not avoid them entirely...

pedro-carneiro avatar Apr 13 '20 15:04 pedro-carneiro

I'm able to set the max header size with the below (Quarkus/Kotlin). maxInboundMetadataSize roughly controls max header size.

    /*
     * Micrometer customisation
     * Look at [io.quarkus.micrometer.runtime.MicrometerRecorder], where beanManager is used.
     * Can be customised with MeterFilter, MeterRegistry, MeterBinder, MeterFilterConstraint
     */
    @Produces
    @Singleton
    @IfBuildProfile("prod")
    fun produceMeterRegistry(): MeterRegistry {
        val config = object : StackdriverConfig {
            override fun projectId(): String {
                return gcpProjectID
            }

            override fun get(key: String): String? = null
        }
        return StackdriverMeterRegistry
            .builder(config)
            .metricServiceSettings {
                val builder = MetricServiceSettings.newBuilder()
                if (config.credentials() != null) {
                    builder.credentialsProvider = config.credentials()
                }
                builder.transportChannelProvider =
                    MetricServiceSettings.defaultGrpcTransportProviderBuilder().setChannelConfigurator { builder ->
                        when (builder) {
                            is AbstractManagedChannelImplBuilder<*> -> builder.maxInboundMetadataSize(1024 * 1024)
                            else -> builder
                        }
                    }.build()
                builder.headerProvider = HeaderProvider {
                    val version = StackdriverMeterRegistry::class.java.`package`.implementationVersion
                    mapOf(
                        "User-Agent" to "Micrometer/$version micrometer-registry-stackdriver/$version"
                    )
                }
                builder.build()
            }
            .build()
    }

danelowe avatar Mar 29 '22 08:03 danelowe

Hi, I am also encountering this problem. The HTTP/2 problem is due to monitoring.googleapis.com sending back an overly large response. This is in violation of the HTTP/2 spec, which is why the underlying reason is hard to find. I don't know why they are ignoring the max header size, but that's a separate thing.

The header block includes a large error message, very close to the 8KB limit, which I have included. The error message is included in the grpc-message status field, but is also replicated into the grpc-status-details-bin, effectively doubling the size. This makes a 4KB message turn into an 8K, pushing it over the limit. It seems to indicate that the sampling rate is too high. I am not as familiar with tuning this, but it seems like less frequent, larger batches may help.

One or more TimeSeries could not be written: One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[4]: custom.googleapis.com/process/uptime{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[6]: custom.googleapis.com/system/load/average/1m{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[24,25,74]: custom.googleapis.com/logback/events{levelwarn}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[5,20,28,31,33,73]: custom.googleapis.com/jvm/memory/used{areanonheap,idCompressed Class Space}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[0,53,70,97]: custom.googleapis.com/jvm/threads/states{stateblocked}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[52]: custom.googleapis.com/jvm/threads/daemon{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[21]: custom.googleapis.com/jvm/buffer/total/capacity{iddirect}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[86]: custom.googleapis.com/jvm/classes/unloaded{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[90]: custom.googleapis.com/jvm/threads/peak{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[51]: custom.googleapis.com/process/start/time{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[26,30,41,80,92]: custom.googleapis.com/jvm/memory/max{idCompressed Class Space,areanonheap}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[27,75]: custom.googleapis.com/jvm/buffer/count{idmapped}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[62]: custom.googleapis.com/jvm/gc/live/data/size{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[43]: custom.googleapis.com/system/cpu/count{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[54,55,57,76,79,87,93]: custom.googleapis.com/jvm/memory/committed{idCodeHeap 'non-nmethods',areanonheap}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[63]: custom.googleapis.com/jvm/buffer/memory/used{iddirect}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[84]: custom.googleapis.com/jvm/threads/live{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[89]: custom.googleapis.com/jvm/classes/loaded{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[61]: custom.googleapis.com/process/files/open{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[95]: custom.googleapis.com/jvm/gc/memory/promoted{}; One or more points were written more frequently than the maximum sampling period configured for the metric.: global{} timeSeries[22]: custom.googleapis.com/system/cpu/usage{}

Another error message that pushes the limit:

One or more TimeSeries could not be written: Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[84]: custom.googleapis.com/jvm/threads/live{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[5,20,28,31,33,73]: custom.googleapis.com/jvm/memory/used{idCompressed Class Space,areanonheap}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[86]: custom.googleapis.com/jvm/classes/unloaded{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[4]: custom.googleapis.com/process/uptime{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[21]: custom.googleapis.com/jvm/buffer/total/capacity{iddirect}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[24,25,74]: custom.googleapis.com/logback/events{levelwarn}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[27,75]: custom.googleapis.com/jvm/buffer/count{idmapped}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[62]: custom.googleapis.com/jvm/gc/live/data/size{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[51]: custom.googleapis.com/process/start/time{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[6]: custom.googleapis.com/system/load/average/1m{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[54,55,57,76,79,87,93]: custom.googleapis.com/jvm/memory/committed{areanonheap,idCodeHeap 'non-nmethods'}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[63]: custom.googleapis.com/jvm/buffer/memory/used{iddirect}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[90]: custom.googleapis.com/jvm/threads/peak{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[22]: custom.googleapis.com/system/cpu/usage{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[95]: custom.googleapis.com/jvm/gc/memory/promoted{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[43]: custom.googleapis.com/system/cpu/count{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[0,53,70,97]: custom.googleapis.com/jvm/threads/states{stateblocked}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[26,30,41,80,92]: custom.googleapis.com/jvm/memory/max{idCompressed Class Space,areanonheap}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[52]: custom.googleapis.com/jvm/threads/daemon{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[61]: custom.googleapis.com/process/files/open{}; Points must be written in order. One or more of the points specified had an older end time than the most recent point.: global{} timeSeries[89]: custom.googleapis.com/jvm/classes/loaded{}

carl-mastrangelo avatar Aug 10 '22 04:08 carl-mastrangelo

I found the solution for me. I have to add custom tags to the metrics to ensure that these are unique across services/instances:

management:
  metrics:
    tags:
      app: my-unique-service-label
      instance: ${random.uuid}

See also:

  • https://stackoverflow.com/questions/61500324/unable-to-publish-spring-boot-metrics-to-gcp-stackdriver/73386941#73386941
  • https://github.com/GoogleCloudPlatform/spring-cloud-gcp/issues/1223

AQS-DTheuke avatar Aug 17 '22 10:08 AQS-DTheuke

@carl-mastrangelo how you get those messages from header?

smolarek999 avatar Aug 24 '22 14:08 smolarek999

@smolarek999 You can configure the limit using this method: https://grpc.github.io/grpc-java/javadoc/io/grpc/ManagedChannelBuilder.html#maxInboundMetadataSize-int-

See this comment for an example: https://github.com/micrometer-metrics/micrometer/issues/1268#issuecomment-1081594772

AQS-DTheuke avatar Aug 24 '22 15:08 AQS-DTheuke

@carl-mastrangelo how you get those messages from header?

This bug is reproducible locally, so I stepped through in a debugger until I could see the values in Netty.

carl-mastrangelo avatar Aug 24 '22 21:08 carl-mastrangelo

Is this still a problem? I'm asking since here someone has pointed out a solution

marcingrzejszczak avatar Dec 21 '23 14:12 marcingrzejszczak

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

github-actions[bot] avatar Dec 31 '23 01:12 github-actions[bot]

There are a few confounding issues, but I was able to come up with a work around.

That said, it would be nice if Micrometer would open up the Constructor to StackdriverMeterRegistry:

https://github.com/micrometer-metrics/micrometer/blob/d904d760198c5e2e14b985b8ca0d7ef76bb7957a/implementations/micrometer-registry-stackdriver/src/main/java/io/micrometer/stackdriver/StackdriverMeterRegistry.java#L97

There are multiple parts to the work around, but the one that Micrometer can help with is opening up the visibility on that constructor. We need to be able to set the http2 settings on the client, and we cannot pass it in today. My hack solution is reflection to make that ctor accessible, which isn't ideal.

(aside: Micrometer is fundamentally incompatible with GCP's Cloud Run, which is explicitly does not support custom metrics. (and even more aside, they seem to have modified that doc to suggest a side car rather than in-process reporting). They instead suggest using log-based metrics, which are not as capable. My hack solution is to report metrics as the Generic Node resource type, which does accept an instance ID, as mentioned in the suggested answer you linked to.)

carl-mastrangelo avatar Dec 31 '23 23:12 carl-mastrangelo

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

github-actions[bot] avatar Jan 10 '24 01:01 github-actions[bot]

Closing due to lack of requested feedback. If you would like us to look at this issue, please provide the requested information and we will re-open.

github-actions[bot] avatar Jan 18 '24 01:01 github-actions[bot]

Contributor

it is still a problem because e.g. OTEL auto-instrumentation keeps failing on GCP because it cannot send profiles to Cloud Profiler.

matevarga avatar Mar 06 '24 12:03 matevarga