cpp_client_telemetry icon indicating copy to clipboard operation
cpp_client_telemetry copied to clipboard

Android: SIGSEGV /SEGV_MAPERR in Logger_jni.cpp

Open theta3 opened this issue 4 years ago • 15 comments
trafficstars

Describe your environment. SDK built on commit 2b41385dd9858293ad98e6d4d66f000640eaab8a

Steps to reproduce. App crash soon after boot

What is the expected behavior? No crash

Additional context.

Operating system: Android
                  0.0.0 Linux 4.14.190-20767019-abN9700ZCU3EUBE #2 SMP PREEMPT Thu Feb 25 20:38:23 KST 2021 aarch64
CPU: arm64
     8 CPUs

GPU: UNKNOWN

Crash reason:  SIGSEGV /SEGV_MAPERR
Crash address: 0x0
Process uptime: not available

Thread 76 (crashed)
 0  libmaesdk.so!Java_com_microsoft_applications_events_Logger_nativeLogEventProperties [Logger_jni.cpp : 198 + 0x0]
     x0 = 0x0000000000000037    x1 = 0x0000006f13050b70
     x2 = 0x0000000000000000    x3 = 0x0000006fd145a2b6
     x4 = 0x0000006f13050b58    x5 = 0xb400006fe1a4c050
     x6 = 0xfeff6b6b6073726d    x7 = 0x7f7f7f7f7f7f7f7f
     x8 = 0x744424b8aaafbe27    x9 = 0x744424b8aaafbe27
    x10 = 0x0000000000430000   x11 = 0x0000000000000066
    x12 = 0x0000000000000000   x13 = 0x0000000000000002
    x14 = 0x0000006fd14ca4fc   x15 = 0x0000000000000000
    x16 = 0x0000006f343a59d8   x17 = 0x0000007265211b58
    x18 = 0x0000006f110c6000   x19 = 0x0000000000000000
    x20 = 0x0000006f13056000   x21 = 0x000000001b9c6230
    x22 = 0x0000000000000000   x23 = 0x0000000012e925e8
    x24 = 0x0000000000000000   x25 = 0x000000001bb5db28
    x26 = 0x000000001bb5da00   x27 = 0x0000000000000000
    x28 = 0x0000000012e95b58    fp = 0x0000006f13050d60
     lr = 0x0000006f30e7547c    sp = 0x0000006f13050d00
     pc = 0x0000006f30e7547c
    Found by: given as instruction pointer in context
 1  base.odex + 0x122d1c
    x19 = 0xb400007141aba0c0   x20 = 0x0000000000000000
    x21 = 0x000000001b9c6230   x22 = 0x0000000000000000
    x23 = 0x0000000012e925e8   x24 = 0x0000000000000000
    x25 = 0x000000001bb5db28   x26 = 0x000000001bb5da00
    x27 = 0x0000000000000000   x28 = 0x0000000012e95b58
     fp = 0x0000000000000000    sp = 0x0000006f13050d70
     pc = 0x0000006f644f1d20
    Found by: call frame info

theta3 avatar Sep 27 '21 17:09 theta3

Tagging @larvacea and @saurabh-sagrawal for their android expertise.

sid-dahiya avatar Sep 28 '21 18:09 sid-dahiya

Thanks, @sid-dahiya for tagging Martin and Saurabh.

@theta3 - 1DS C++ Observability team doesn't provide support for Mobile platform SDKs. You'd have to debug it on your side yourself, to understand the cause and propose a PR with a fix if required. We will help reviewing the PR.

@larvacea ( Martin ) from the Office Telemetry team might have some ideas, as he did the implementation for Android. But if the issue is not impacting Office apps, then there won't be anybody else here who can help with this issue.

lalitb avatar Sep 28 '21 18:09 lalitb

In current code base for the crash stack Logger_jni.cpp : 198 is an empty line https://github.com/microsoft/cpp_client_telemetry/blob/main/lib/jni/Logger_jni.cpp If it's a consistent repro can you find out what Java API is used which is calling this jni method and if the arguments to the API are valid.

saurabh-sagrawal avatar Sep 29 '21 00:09 saurabh-sagrawal

It's in this branch, not the main branch https://github.com/microsoft/cpp_client_telemetry/tree/outlookmobile/droid.

The method is this:

JNIEXPORT void JNICALL Java_com_microsoft_applications_events_Logger_nativeLogEventProperties(
        JNIEnv* env,
        jobject /* this */,
        jlong nativeLoggerPtr,
        // EventProperties
        jstring jstrEventName,
        jstring jstrEventType,
        jint jEventLatency,
        jint jEventPersistence,
        jdouble jEventPopSample,
        jlong jEventPolicyBitflags,
        jlong jTimestampInMillis,
        jobjectArray jEventPropertyStringKey,
        jobjectArray jEventPropertyValue) {
    auto logger = reinterpret_cast<ILogger*>(nativeLoggerPtr);
    EventProperties properties = GetEventProperties(env, jstrEventName, jstrEventType, jEventLatency, jEventPersistence,
            jEventPopSample, jEventPolicyBitflags, jTimestampInMillis, jEventPropertyStringKey, jEventPropertyValue);
    logger->LogEvent(properties);
}

which is called from Logger.java public void logEvent(final EventProperties properties).

It's not a consistent repro and I can't repro it locally. Some tester found it but then it go away on next boot.

theta3 avatar Sep 29 '21 01:09 theta3

It seems the native ptr that Logger is holding is invalid at the time of jni call. Can you share how are you getting the instance of Logger, i.e your app code and how you are calling the API?

saurabh-sagrawal avatar Sep 30 '21 18:09 saurabh-sagrawal

Here is the init code (redacted for brevity):

HttpClient client = new OneDSHttpClient(context);
OfflineRoom.connectContext(context); // from an object that has getApplicationContext
ILogConfiguration logConfiguration = LogManager.getLogConfigurationCopy();
...
logConfiguration.set(......)
...
LogManager.initialize(tenantId, logConfiguration);
...
LogManager.setContext(......);
...
ILogger myLocalLogger = LogManager.getLogger(tenantId, source);

The myLocalLogger is hold as a private variable of a Java Logger object.

Our Java Logger class has this method:

public void logEvent(LoggerEvent loggerEvent) {
    if (m1DSLogger != null) {
        m1DSLogger.logEvent(eventPropertiesFromLoggerEvent(loggerEvent));
    }
}

When we log, we just call this method. I can't repro this locally and it only happens for a very small subset of devices

theta3 avatar Sep 30 '21 18:09 theta3

@saurabh-sagrawal any thought?

We have been using this code for a long time without any change and the crash is only for a small subset of devices

theta3 avatar Oct 01 '21 17:10 theta3

Here is the init code (redacted for brevity):

HttpClient client = new OneDSHttpClient(context);
OfflineRoom.connectContext(context); // from an object that has getApplicationContext
ILogConfiguration logConfiguration = LogManager.getLogConfigurationCopy();
...
logConfiguration.set(......)
...
LogManager.initialize(tenantId, logConfiguration);
...
LogManager.setContext(......);
...
ILogger myLocalLogger = LogManager.getLogger(tenantId, source);

The myLocalLogger is hold as a private variable of a Java Logger object.

Our Java Logger class has this method:

public void logEvent(LoggerEvent loggerEvent) {
    if (m1DSLogger != null) {
        m1DSLogger.logEvent(eventPropertiesFromLoggerEvent(loggerEvent));
    }
}

When we log, we just call this method. I can't repro this locally and it only happens for a very small subset of devices

@theta3, in the above code snippet I'm assuming that you are assigning myLocalLogger to m1DSLogger somewhere and it's the same logger and isn't any other instance of the Logger.

Also in your latest reply do you mean to say the crash started to happen more recently and earlier it was not observed? Can you share the sync point from which it started to happen.

saurabh-sagrawal avatar Oct 01 '21 19:10 saurabh-sagrawal

@saurabh-sagrawal yes, that's correct. It's the same ILogger. It's a private variable of a Java class. And the Java class instance is held in a static dictionary. We didn't notice the crash before, yes. The latest sync point is commit 2b9c151c1b5c861d32979e766aa3c35d0d93d99a in master. The sync point right before that is commit 503a926941124095397a6b861ee601a4522c6c1f. We didn't notice a crash like this before 503a926

theta3 avatar Oct 01 '21 19:10 theta3

What was the sync point before 503a926, we would probably need to check what went in between that CL# and 503a926 since first crash was seen after 503a926.

saurabh-sagrawal avatar Oct 01 '21 23:10 saurabh-sagrawal

No, after 2b9c151 is when we start noticing crashing.

First crash was seen AFTER 2b9c151

The sync point before 2b9c151 IS 503a926

theta3 avatar Oct 02 '21 00:10 theta3

@saurabh-sagrawal any thought on what happens between 503a926 and 2b9c151?

theta3 avatar Oct 05 '21 18:10 theta3

@theta3 - Have you found any workaround/fix for this issue?

lalitb avatar Dec 13 '21 20:12 lalitb

nope

theta3 avatar Dec 13 '21 21:12 theta3

nope I think you were looking into some scenarios where it would be an issue but I guess you didn't had any luck with it. Another thing would be to check how severe is the issue right now, is the counts still in the same range or reduced.

saurabh-sagrawal avatar Dec 13 '21 22:12 saurabh-sagrawal