cpp_client_telemetry
cpp_client_telemetry copied to clipboard
Android: SIGSEGV /SEGV_MAPERR in Logger_jni.cpp
Describe your environment. SDK built on commit 2b41385dd9858293ad98e6d4d66f000640eaab8a
Steps to reproduce. App crash soon after boot
What is the expected behavior? No crash
Additional context.
Operating system: Android
0.0.0 Linux 4.14.190-20767019-abN9700ZCU3EUBE #2 SMP PREEMPT Thu Feb 25 20:38:23 KST 2021 aarch64
CPU: arm64
8 CPUs
GPU: UNKNOWN
Crash reason: SIGSEGV /SEGV_MAPERR
Crash address: 0x0
Process uptime: not available
Thread 76 (crashed)
0 libmaesdk.so!Java_com_microsoft_applications_events_Logger_nativeLogEventProperties [Logger_jni.cpp : 198 + 0x0]
x0 = 0x0000000000000037 x1 = 0x0000006f13050b70
x2 = 0x0000000000000000 x3 = 0x0000006fd145a2b6
x4 = 0x0000006f13050b58 x5 = 0xb400006fe1a4c050
x6 = 0xfeff6b6b6073726d x7 = 0x7f7f7f7f7f7f7f7f
x8 = 0x744424b8aaafbe27 x9 = 0x744424b8aaafbe27
x10 = 0x0000000000430000 x11 = 0x0000000000000066
x12 = 0x0000000000000000 x13 = 0x0000000000000002
x14 = 0x0000006fd14ca4fc x15 = 0x0000000000000000
x16 = 0x0000006f343a59d8 x17 = 0x0000007265211b58
x18 = 0x0000006f110c6000 x19 = 0x0000000000000000
x20 = 0x0000006f13056000 x21 = 0x000000001b9c6230
x22 = 0x0000000000000000 x23 = 0x0000000012e925e8
x24 = 0x0000000000000000 x25 = 0x000000001bb5db28
x26 = 0x000000001bb5da00 x27 = 0x0000000000000000
x28 = 0x0000000012e95b58 fp = 0x0000006f13050d60
lr = 0x0000006f30e7547c sp = 0x0000006f13050d00
pc = 0x0000006f30e7547c
Found by: given as instruction pointer in context
1 base.odex + 0x122d1c
x19 = 0xb400007141aba0c0 x20 = 0x0000000000000000
x21 = 0x000000001b9c6230 x22 = 0x0000000000000000
x23 = 0x0000000012e925e8 x24 = 0x0000000000000000
x25 = 0x000000001bb5db28 x26 = 0x000000001bb5da00
x27 = 0x0000000000000000 x28 = 0x0000000012e95b58
fp = 0x0000000000000000 sp = 0x0000006f13050d70
pc = 0x0000006f644f1d20
Found by: call frame info
Tagging @larvacea and @saurabh-sagrawal for their android expertise.
Thanks, @sid-dahiya for tagging Martin and Saurabh.
@theta3 - 1DS C++ Observability team doesn't provide support for Mobile platform SDKs. You'd have to debug it on your side yourself, to understand the cause and propose a PR with a fix if required. We will help reviewing the PR.
@larvacea ( Martin ) from the Office Telemetry team might have some ideas, as he did the implementation for Android. But if the issue is not impacting Office apps, then there won't be anybody else here who can help with this issue.
In current code base for the crash stack Logger_jni.cpp : 198 is an empty line https://github.com/microsoft/cpp_client_telemetry/blob/main/lib/jni/Logger_jni.cpp If it's a consistent repro can you find out what Java API is used which is calling this jni method and if the arguments to the API are valid.
It's in this branch, not the main branch https://github.com/microsoft/cpp_client_telemetry/tree/outlookmobile/droid.
The method is this:
JNIEXPORT void JNICALL Java_com_microsoft_applications_events_Logger_nativeLogEventProperties(
JNIEnv* env,
jobject /* this */,
jlong nativeLoggerPtr,
// EventProperties
jstring jstrEventName,
jstring jstrEventType,
jint jEventLatency,
jint jEventPersistence,
jdouble jEventPopSample,
jlong jEventPolicyBitflags,
jlong jTimestampInMillis,
jobjectArray jEventPropertyStringKey,
jobjectArray jEventPropertyValue) {
auto logger = reinterpret_cast<ILogger*>(nativeLoggerPtr);
EventProperties properties = GetEventProperties(env, jstrEventName, jstrEventType, jEventLatency, jEventPersistence,
jEventPopSample, jEventPolicyBitflags, jTimestampInMillis, jEventPropertyStringKey, jEventPropertyValue);
logger->LogEvent(properties);
}
which is called from Logger.java public void logEvent(final EventProperties properties).
It's not a consistent repro and I can't repro it locally. Some tester found it but then it go away on next boot.
It seems the native ptr that Logger is holding is invalid at the time of jni call. Can you share how are you getting the instance of Logger, i.e your app code and how you are calling the API?
Here is the init code (redacted for brevity):
HttpClient client = new OneDSHttpClient(context);
OfflineRoom.connectContext(context); // from an object that has getApplicationContext
ILogConfiguration logConfiguration = LogManager.getLogConfigurationCopy();
...
logConfiguration.set(......)
...
LogManager.initialize(tenantId, logConfiguration);
...
LogManager.setContext(......);
...
ILogger myLocalLogger = LogManager.getLogger(tenantId, source);
The myLocalLogger is hold as a private variable of a Java Logger object.
Our Java Logger class has this method:
public void logEvent(LoggerEvent loggerEvent) {
if (m1DSLogger != null) {
m1DSLogger.logEvent(eventPropertiesFromLoggerEvent(loggerEvent));
}
}
When we log, we just call this method. I can't repro this locally and it only happens for a very small subset of devices
@saurabh-sagrawal any thought?
We have been using this code for a long time without any change and the crash is only for a small subset of devices
Here is the init code (redacted for brevity):
HttpClient client = new OneDSHttpClient(context); OfflineRoom.connectContext(context); // from an object that has getApplicationContext ILogConfiguration logConfiguration = LogManager.getLogConfigurationCopy(); ... logConfiguration.set(......) ... LogManager.initialize(tenantId, logConfiguration); ... LogManager.setContext(......); ... ILogger myLocalLogger = LogManager.getLogger(tenantId, source);The
myLocalLoggeris hold as a private variable of a Java Logger object.Our Java Logger class has this method:
public void logEvent(LoggerEvent loggerEvent) { if (m1DSLogger != null) { m1DSLogger.logEvent(eventPropertiesFromLoggerEvent(loggerEvent)); } }When we log, we just call this method. I can't repro this locally and it only happens for a very small subset of devices
@theta3, in the above code snippet I'm assuming that you are assigning myLocalLogger to m1DSLogger somewhere and it's the same logger and isn't any other instance of the Logger.
Also in your latest reply do you mean to say the crash started to happen more recently and earlier it was not observed? Can you share the sync point from which it started to happen.
@saurabh-sagrawal yes, that's correct. It's the same ILogger. It's a private variable of a Java class. And the Java class instance is held in a static dictionary. We didn't notice the crash before, yes. The latest sync point is commit 2b9c151c1b5c861d32979e766aa3c35d0d93d99a in master. The sync point right before that is commit 503a926941124095397a6b861ee601a4522c6c1f. We didn't notice a crash like this before 503a926
What was the sync point before 503a926, we would probably need to check what went in between that CL# and 503a926 since first crash was seen after 503a926.
No, after 2b9c151 is when we start noticing crashing.
First crash was seen AFTER 2b9c151
The sync point before 2b9c151 IS 503a926
@saurabh-sagrawal any thought on what happens between 503a926 and 2b9c151?
@theta3 - Have you found any workaround/fix for this issue?
nope
nope I think you were looking into some scenarios where it would be an issue but I guess you didn't had any luck with it. Another thing would be to check how severe is the issue right now, is the counts still in the same range or reduced.