opentelemetry-android icon indicating copy to clipboard operation
opentelemetry-android copied to clipboard

Unexpected call to beginMessage() in ByteArrayProtoReader32 when exporting logs from disk

Open ploszczyca-spoton opened this issue 9 months ago • 2 comments

Description

I encountered a java.lang.IllegalStateException: Unexpected call to beginMessage() exception when OpenTelemetry's disk buffering feature attempted to export stored log batches. The issue appears to occur while deserializing log data, specifically within ByteArrayProtoReader32 and the ProtoAdapter.decode() method.

Stack Trace

Fatal Exception: java.lang.IllegalStateException: Unexpected call to beginMessage()
       at com.squareup.wire.ByteArrayProtoReader32.beginMessage(ByteArrayProtoReader32.kt:57)
       at com.squareup.wire.ProtoReader32AsProtoReader.beginMessage(ProtoReader32AsProtoReader.kt:1)
       at io.opentelemetry.diskbuffering.proto.common.v1.AnyValue$ProtoAdapter_AnyValue.decode(AnyValue.java:1)
       at io.opentelemetry.diskbuffering.proto.common.v1.AnyValue$ProtoAdapter_AnyValue.decode(AnyValue.java:2)
       at io.opentelemetry.diskbuffering.proto.common.v1.KeyValue$ProtoAdapter_KeyValue.decode(KeyValue.java:1)
       at io.opentelemetry.diskbuffering.proto.common.v1.KeyValue$ProtoAdapter_KeyValue.decode(KeyValue.java:2)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogRecord$ProtoAdapter_LogRecord.decode(LogRecord.java:1)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogRecord$ProtoAdapter_LogRecord.decode(LogRecord.java:2)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ScopeLogs$ProtoAdapter_ScopeLogs.decode(ScopeLogs.java:1)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ScopeLogs$ProtoAdapter_ScopeLogs.decode(ScopeLogs.java:2)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ResourceLogs$ProtoAdapter_ResourceLogs.decode(ResourceLogs.java:1)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ResourceLogs$ProtoAdapter_ResourceLogs.decode(ResourceLogs.java:2)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogsData$ProtoAdapter_LogsData.decode(LogsData.java:1)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogsData$ProtoAdapter_LogsData.decode(LogsData.java:2)
       at com.squareup.wire.ProtoAdapter.decode(ProtoAdapter.kt:2)
       at com.squareup.wire.ProtoAdapter.decode(ProtoAdapter.kt:16)
       at io.opentelemetry.contrib.disk.buffering.internal.serialization.deserializers.LogRecordDataDeserializer.deserialize(LogRecordDataDeserializer.java:6)
       at io.opentelemetry.contrib.disk.buffering.internal.exporter.FromDiskExporterImpl.lambda$exportStoredBatch$0(FromDiskExporterImpl.java:43)
       at io.opentelemetry.contrib.disk.buffering.internal.storage.files.ReadableFile.readAndProcess(ReadableFile.java:45)
       at io.opentelemetry.contrib.disk.buffering.internal.storage.Storage.readAndProcess(Storage.java:2)
       at io.opentelemetry.contrib.disk.buffering.internal.storage.Storage.readAndProcess(Storage.java:1)
       at io.opentelemetry.contrib.disk.buffering.internal.exporter.FromDiskExporterImpl.exportStoredBatch(FromDiskExporterImpl.java:37)
       at io.opentelemetry.contrib.disk.buffering.LogRecordFromDiskExporter.exportStoredBatch(LogRecordFromDiskExporter.java:2)
       at io.opentelemetry.android.features.diskbuffering.SignalFromDiskExporter.exportBatchOfLogs(SignalFromDiskExporter.kt:8)
       at io.opentelemetry.android.features.diskbuffering.SignalFromDiskExporter.exportBatchOfEach(SignalFromDiskExporter.kt:12)
       at io.opentelemetry.android.features.diskbuffering.scheduler.DefaultExportScheduler.onRun(DefaultExportScheduler.kt:9)
       at io.opentelemetry.android.internal.services.periodicwork.PeriodicRunnable.run(PeriodicRunnable.kt:6)
       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
       at java.lang.Thread.run(Thread.java:919)
        

Steps to Reproduce:

  1. Enable disk buffering in OpenTelemetry Android SDK.
        val diskBufferingConfig = DiskBufferingConfiguration.builder()
            .setEnabled(true)
            .build()
  1. Generate log events and store them on disk.
  2. Wait for the log export process to trigger.
  3. Observe the crash when logs are being read and deserialized.

Versions:

  • opentelemetry-android: 0.8.0-alpha
  • Android OS: 10

ploszczyca-spoton avatar Feb 19 '25 12:02 ploszczyca-spoton

Hi @ploszczyca-spoton

This seems to be related to this fix which should be available in the disk-buffering artifact version 1.43.0-alpha.

It should become available in opentelemetry-android in the next release, though you should still be able to use the latest disk-buffering version in your project in the meantime, by adding it explicitly alongside opentelemetry-android, like so:

dependencies {
  implementation("io.opentelemetry.android:android-agent:0.9.1-alpha")
  implementation("io.opentelemetry.contrib:opentelemetry-disk-buffering:1.43.0-alpha")
}

Which should override the older version provided in opentelemetry-android.

LikeTheSalad avatar Feb 20 '25 12:02 LikeTheSalad

Hi @LikeTheSalad, Thank you for the response! Great to hear that the issue is already addressed. The crash doesn’t occur often, so I will wait for the next opentelemetry-android release with disk-buffering version bump.

ploszczyca-spoton avatar Feb 20 '25 12:02 ploszczyca-spoton

@ploszczyca-spoton Sorry for the delay! We managed to get v0.12.0 released today. Can you try that one out and let us know if it resolves this for you? Thanks thanks.

breedx-splk avatar Jul 08 '25 22:07 breedx-splk

Note: I closed #997 because this does appear to be fixed. Please let us know. Thanks.

breedx-splk avatar Jul 09 '25 22:07 breedx-splk

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.

github-actions[bot] avatar Jul 31 '25 01:07 github-actions[bot]

@breedx-splk Were we able to find any reproduction steps on that? We've tried the 0.12.0 as well and we are seeing significant number of similar crashes still. We didn't find the reproduction scenarios, but maybe you have some suggestions on that matter.

JakubMosakowski avatar Aug 07 '25 07:08 JakubMosakowski

Can you try running the latest demo-app from this repo and see if this problem is still happening for you? It's been a bit since I've run that app, but I'm quite sure that I've never seen that same issue in the demo app....and it uses disk buffering.

Can you share anything else that might be unusual about your configuration? I also wonder if there's just old/stale data sitting on disk that needs to be purged. Like, maybe something is trying to read old/corrupt data and when it fails that data stays around to continue causing problems. Maybe?

breedx-splk avatar Aug 08 '25 22:08 breedx-splk

It still occurs in 0.13.0. Sadly, we have not been able to reproduce it so far.

I analyzed fix https://github.com/open-telemetry/opentelemetry-java-contrib/pull/1571:

@Override
public List<LogRecordData> deserialize(byte[] source) throws DeserializationException {
  try {
    return ProtoLogsDataMapper.getInstance().fromProto(LogsData.ADAPTER.decode(source));
  } catch (IOException e) {
    throw new DeserializationException(e);
  }
}

Problem there is that issue in LogsData.ADAPTER.decode(source) throws IllegalStateException, not IOException:

Fatal Exception: java.lang.IllegalStateException: Unexpected call to beginMessage()
       at com.squareup.wire.ByteArrayProtoReader32.beginMessage(ByteArrayProtoReader32.kt:92)
       at com.squareup.wire.ProtoReader32AsProtoReader.beginMessage(ProtoReader32AsProtoReader.kt:31)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ResourceLogs$ProtoAdapter_ResourceLogs.decode(ResourceLogs.java:215)
       at io.opentelemetry.diskbuffering.proto.logs.v1.ResourceLogs$ProtoAdapter_ResourceLogs.decode(ResourceLogs.java:177)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogsData$ProtoAdapter_LogsData.decode(LogsData.java:155)
       at io.opentelemetry.diskbuffering.proto.logs.v1.LogsData$ProtoAdapter_LogsData.decode(LogsData.java:124)

Therefore, the exception is not caught. It is also not mapped to DeserializationException and caught again in FromDiskExporterImpl.java


The previous issue https://github.com/open-telemetry/opentelemetry-android/issues/457 was also related to an IllegalArgumentException, but different one:

java.lang.IllegalArgumentException: java.net.ProtocolException: Unexpected tag 0                                                                                                   	at io.opentelemetry.contrib.disk.buffering.internal.serialization.deserializers.SpanDataDeserializer.deserialize(SpanDataDeserializer.java:28)
at io.opentelemetry.contrib.disk.buffering.internal.exporter.FromDiskExporterImpl.lambda$exportStoredBatch$0$io-opentelemetry-contrib-disk-buffering-internal-exporter-FromDiskExporterImpl(FromDiskExporterImpl.java:59)                                                                                                    	at io.opentelemetry.contrib.disk.buffering.internal.exporter.FromDiskExporterImpl$$ExternalSyntheticLambda0.apply(D8$$SyntheticClass:0)

That one was thrown inside opentelemetry code, not in the ByteArrayProtoReader32 (and is now replaced by DeserializationException)

(I will submit PR to https://github.com/open-telemetry/opentelemetry-java-contrib, but opinion on that would be great ;) )

JacekPietrasSpotOn avatar Aug 20 '25 13:08 JacekPietrasSpotOn

Hi @JacekPietrasSpotOn, what you mentioned about IllegalArgumentException sounds reasonable 👍

(I will submit PR to https://github.com/open-telemetry/opentelemetry-java-contrib, but opinion on that would be great ;) )

Sounds good, thanks!

LikeTheSalad avatar Aug 21 '25 07:08 LikeTheSalad

That fix in contrib looks good, I approved it and it's in the merge queue, but we just missed the release window, so it'll be out next month. Thanks again 👍🏻

breedx-splk avatar Aug 22 '25 15:08 breedx-splk

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.

github-actions[bot] avatar Sep 12 '25 16:09 github-actions[bot]