MQTT-Client-Framework
MQTT-Client-Framework copied to clipboard
-[MQTTDecoder stream:handleEvent:] crashing with EXC_BAD_ACCESS KERN_INVALID_ADDRESS
Over the past days I've seen a few crashes like this for my app in Crashlytics. Any ideas what might be going wrong? Using MQTTClient 0.9.2.
Crashed: com.apple.main-thread
0 libobjc.A.dylib 0x18b3fc150 objc_msgSend + 16
1 MQTTClient 0x1009f62b8 -[MQTTDecoder stream:handleEvent:] (MQTTDecoder.m:132)
2 CoreFoundation 0x18c8d201c _signalEventSync + 212
3 CoreFoundation 0x18c8d1f28 _cfstream_shared_signalEventSync + 440
4 CoreFoundation 0x18c93542c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
5 CoreFoundation 0x18c934d9c __CFRunLoopDoSources0 + 540
6 CoreFoundation 0x18c9329a8 __CFRunLoopRun + 744
7 CoreFoundation 0x18c862da4 CFRunLoopRunSpecific + 424
8 GraphicsServices 0x18e2cc074 GSEventRunModal + 100
9 UIKit 0x192b1d058 UIApplicationMain + 208
10 MyApp 0x1001970e8 main (AppDelegate.swift:20)
11 libdyld.dylib 0x18b87159c start + 4
The one time so far I've managed to reproduce this on an Xcode debug, I got the following as the tail of the console log. I added one more debug print on line 130 of MQTTDecoder.m to print the dataBuffer
as a UTF-8 string (prefixed with [MQTTDecoder] data:
in the log), and looks like the problematic data was cting.
. My guess is the backend is doing something seriously wrong here, but it'd be nice if the iOS client didn't crash on it.
2017-07-02 13:53:16.261999+0300 MyApp[61535:16904187] [MQTTDecoder] received (54)=<31343938 39393237 38383a20 536f636b 65742065 72726f72 206f6e20 636c6965 6e74203c 756e6b6e 6f776e3e 2c206469 73636f6e 6e65>...
2017-07-02 13:53:16.262015+0300 MyApp[61535:16904187] [MQTTDecoder] data: 1498992788: Socket error on client <unknown>, disconne
2017-07-02 13:53:16.262034+0300 MyApp[61535:16904187] [MQTTSession] other state
2017-07-02 13:53:16.262048+0300 MyApp[61535:16904187] [MQTTDecoder] NSStreamEventHasBytesAvailable
2017-07-02 13:53:16.262257+0300 MyApp[61535:16904187] [MQTTDecoder] fixedHeader=0x63
2017-07-02 13:53:16.262724+0300 MyApp[61535:16904187] [MQTTDecoder] digit=0x74 0x74 0 1
2017-07-02 13:53:16.262829+0300 MyApp[61535:16904187] [MQTTDecoder] remainingLength=116
2017-07-02 13:53:16.262862+0300 MyApp[61535:16904187] [MQTTDecoder] read 116 116
2017-07-02 13:53:16.262975+0300 MyApp[61535:16904187] [MQTTDecoder] received (118)=<6374696e 672e0a31 34393839 39323738 383a204e 65772063 6f6e6e65 6374696f 6e206672 6f6d2031 37382e35 352e312e 3138206f 6e20706f 72742031 3838332e 0a313439 38393932 3738383a 204e6577 20636c69 656e7420 636f6e6e 65637465 64206672 6f6d2031 37382e35 352e312e 31382061 7320>...
2017-07-02 13:53:16.263031+0300 MyApp[61535:16904187] [MQTTDecoder] data: cting.
1498992788: New connection from 178.55.1.18 on port 1883.
1498992788: New client connected from 178.55.1.18 as
2017-07-02 13:53:16.263055+0300 MyApp[61535:16904187] [MQTTMessage] unexpected payload after packet identifier
2017-07-02 13:53:16.263221+0300 MyApp[61535:16904187] [MQTTSession] MQTT illegal message received
2017-07-02 13:53:16.263238+0300 MyApp[61535:16904187] [MQTTSession] closeInternal
2017-07-02 13:53:16.263308+0300 MyApp[61535:16904187] [MQTTCFSocketTransport] close
I have met the same issue, any update?
Same issue, how is this going?
the cting.
is just the continuation of the data sent before ...disconne
What broker are you using? It seems the broker is sending mosquitto debug messages to the mqtt connection. This is not MQTT protocol.
The protocol violation is detected. The crash is probably caused by data already received which shouldn't be processed anymore after detecting the protocol violation
Hi @ckrey and sorry for the late reply — the backend is using Mosquitto (https://github.com/toke/docker-mosquitto) as far as I know. I will try to find out more once the backend developers are back from vacations :)
Strangely this bug peaked in the analytics at the latter half of June, then was mostly gone for most of July, until another peak over the very last few days. App usage has been quite constant over this period so it's a bit puzzling what exactly is causing it.
I have same issue on my side. Also i use Mosquito broker on other side. Do you have update or advice what to do?
I might have figured out why this happened. I called close()
on my MQTTSession
right before it was deallocated, which somehow led to MQTTDecoder
getting deallocated right after it called decoder:didReceiveMessage:
and crashing on next line's self.dataBuffer = nil;
. Waiting for closeWithDisconnectHandler
to return the callback before deiniting the session seems to have fixed this as far as I can tell.
Still saw the same crash at least once (though only on a non-debug build so far) despite the fix mentioned above. But it seems to happen much more rarely now, if anything.
Update: The "fix" I mentioned above didn't end up solving the problem, and if anything we actually saw the crashes peak over the past week or two. The root cause of the problem definitely seems to be some garbage (Mosquitto debug logs) being pushed into the pipeline by the backend (unfortunately I can't do anything about that). I tried a couple of other iOS MQTT client libraries and they actually dealt way worse with this problem, crashing right away when connecting, or never managing to parse any data.
However, it seems – and this still needs more data points to make sure but at 40% adoption and zero crashes so far, it sure looks promising – that my latest update got rid of the crashes simply by holding onto each MQTTSession
instance for 30 more seconds after it disconnects before allowing it to be deallocated. The garbage in the data pipeline remains, but this way it doesn't seem to lead to the memory problems. (I'll post another update after I've seen whether the crashes truly are gone now.)
Yes, this time the crashes definitely seem to be gone for good. Still zero crashes reported for the newest app version after five days (we used to have at least a dozen every day).
I have the same crash in Fabric. Any updates on this? @repomies
@pablogeek yes, looks like in my case it completely disappeared after I deployed the fix described above – just holding onto the MQTTSession
instance for extra 30 seconds after disconnecting it (and probably that's much longer than needed) so that the MQTTDecoder
instance doesn't end up deallocated when it's still trying to process an invalid message.
@repomies I am also facing the same issue. How do you hold the session after disconnection? A sample code would help. Thanks.
Can you describe what you are doing with MQTT? Are you subscribing to a topic with a lot of large retained messages and QoS0, which causes the broker to flood the client with data?
Or do you use QoS1/2 but do not have specified a reasonable max_inflight_messages
.
I do not want to say the library is error-free, but under "normal" circumstances it does behave quite stable
@ckrey While the application is moving to background, I close the connection to the MQTT server its an Azure IoT Hub. While coming back to foreground, I try to connect to MQTT server again. Once in a couple of days the crash appears.
- (void)applicationDidEnterBackground:(UIApplication *)application {
// Use this method to release shared resources, save user data, invalidate timers, and store enough application state information to restore your application to its current state in case it is terminated later.
// If your application supports background execution, this method is called instead of applicationWillTerminate: when the user quits.
MQTTSession *session = [[MQTTSessionManager sharedManager] session];
[session closeAndWait:1];
}
- (void)applicationDidBecomeActive:(UIApplication *)application {
// Restart any tasks that were paused (or not yet started) while the application was inactive. If the application was previously in the background, optionally refresh the user interface.
MQTTSession *session = [[MQTTSessionManager sharedManager] session];
if ([session status] == MQTTSessionStatusClosed ) {
NSString *clientID = [ApplicationManager sharedManager].clientID;
NSString *sas = [ApplicationManager sharedManager].sasKey;
NSString *hostURL = [ApplicationManager sharedManager].hostURL;
[[MQTTSessionManager sharedManager] setHost:hostURL];
[[MQTTSessionManager sharedManager] setClientID:clientID];
[[MQTTSessionManager sharedManager] connectWithUserName:[NSString stringWithFormat:@"%@/%@",hostURL,clientID] andPassword:sas];
}
}
Relevant probably: https://github.com/facebook/SocketRocket/issues/532
I believe this bug is fixed on master: https://github.com/novastone-media/MQTT-Client-Framework/blob/master/MQTTClient/MQTTClient/MQTTCFSocketTransport.m#L39-L46
We had to move to using queues instead of run loop which is probably better anyway. There are some bits to resolve before making official release, but feel free to try it out from master.
Ok, 0.10.0 version is on CocoaPods and this issue should be fixed I believe. Please give it a try.
Using MQTTClient 0.12.0, still many crashes like this for my app, [MQTTDecoder stream:handleEvent:] @jcavar
Do you have any idea how it happens?
I think the crash is related to a synchronization issue in close
method (MQTTDecoder.m:88
) leading to a deadlock. I believe most of the clients experience it on the main thread as it's the default configuration.
if (self.queue != dispatch_get_specific(&QueueIdentityKey)) {
dispatch_sync(self.queue, ^{ // deadlock
[self internalClose];
});
}
Existing implementation seems to relay on the assumption that two different queues cannot perform blocks on the same thread. This assumption is not correct as GCD may skip switching threads when it's applicable (for performance reasons).
Here is a test that should reproduce it.
#import <XCTest/XCTest.h>
#import "MQTTDecoder.h"
@interface MQTTDecoderTests : XCTestCase
@end
@implementation MQTTDecoderTests
- (void)testCloseFromNonMainDispachQueueOnMainThread {
MQTTDecoder *sut = [[MQTTDecoder alloc] init];
dispatch_queue_t customQueue = dispatch_queue_create("mqttdecodertests.queue", nil);
dispatch_sync(customQueue, ^{
[sut close];
});
XCTAssertTrue(YES); // test passed if this line is reached
}
@end
The fix is not trivial as the MQTTDecoder
using externally available queue. My naive fix would at least make some cases working, and throw exceptions if the code cannot be performed synchronously in a safe way.
- (void)close {
BOOL dispatchedOnDecoderQueue = self.queue == dispatch_get_specific(&QueueIdentityKey);
BOOL calledOnMainThreadAndDecoderUsingMainQueue = [NSThread isMainThread] && self.queue == dispatch_get_main_queue();
BOOL canSafelyInlineInternalClose = dispatchedOnDecoderQueue || calledOnMainThreadAndDecoderUsingMainQueue;
if (!canSafelyInlineInternalClose) {
@throw [NSException exceptionWithName:NSInternalInconsistencyException
reason:@"Cannot close MQTTDecoder synchronously"
userInfo:nil];
}
[self internalClose];
}
Maybe we could improve the fix by replacing the exception with a code niling the delegate and dispatching internalClose
asynchronously? I am not sure if that would have any bad implications.
This is still happen because QueueIdentityKey = NULL.
I have fixed by this way, and it works.
@interface MQTTDecoder() {
void *QueueIdentityKey;
}
@property (nonatomic) NSMutableArray<NSInputStream *> *streams;
@property (readonly) void *QueueIdentityKey;
@end
- (void *)QueueIdentityKey {
return QueueIdentityKey;
}
- (void)setQueue:(dispatch_queue_t)queue {
_queue = queue;
// We're going to use dispatch_queue_set_specific() to "mark" our queue.
// The dispatch_queue_set_specific() and dispatch_get_specific() functions take a "void *key" parameter.
// Later we can use dispatch_get_specific() to determine if we're executing on our queue.
// From the documentation:
//
// > Keys are only compared as pointers and are never dereferenced.
// > Thus, you can use a pointer to a static variable for a specific subsystem or
// > any other value that allows you to identify the value uniquely.
//
// So we're just going to use the memory address of an ivar.
QueueIdentityKey = &QueueIdentityKey;
dispatch_queue_set_specific(_queue, QueueIdentityKey, QueueIdentityKey, NULL);
}
- (void)close {
// https://github.com/novastone-media/MQTT-Client-Framework/issues/325
dispatch_block_t block = ^{
[self internalClose];
};
if (dispatch_get_specific(QueueIdentityKey))
block();
else
dispatch_sync(_queue, block);
}
We merged this a while ago I think. Is this issue still happening?
I’m sorry I can’t answer for myself, since our app’s backend infrastructure unfortunately no longer exists.
This issue also happened in my application lately, using version 0.15.2
Can you provide more information about how it happens (logs, conditions, steps)?
https://github.com/novastone-media/MQTT-Client-Framework/issues/526
Crashed: com.apple.main-thread 0 libsystem_kernel.dylib 0x1a5cc7db8 close + 8 1 CoreFoundation 0x1a5e6671c CFSocketInvalidate + 820 2 CFNetwork 0x1a92734ec _CFStreamErrorFromCFError + 158756 3 CoreFoundation 0x1a5e708cc _CFStreamClose + 108 4 MQTTClient 0x105c7e134 -[MQTTCFSocketDecoder close] + 60 5 MQTTClient 0x105c7f6d4 -[MQTTCFSocketTransport internalClose] + 244 6 MQTTClient 0x105c8c718 -[MQTTSession closeInternal] + 332 7 MQTTClient 0x105c91130 -[MQTTSession error:error:] + 224 8 MQTTClient 0x105c90e80 -[MQTTSession connectionError:] + 60 9 MQTTClient 0x105c7f880 -[MQTTCFSocketTransport encoder:didFailWithError:] + 104 10 MQTTClient 0x105c7e90c -[MQTTCFSocketEncoder stream:handleEvent:] + 508 11 MQTTClient 0x105c988f8 -[MQTTSSLSecurityPolicyEncoder stream:handleEvent:] + 124 12 CoreFoundation 0x1a5e746ac _signalEventSync + 216 13 CoreFoundation 0x1a5e747b8 ___signalEventQueue_block_invoke + 28 14 libdispatch.dylib 0x1a5b7eec4 _dispatch_call_block_and_release + 32 15 libdispatch.dylib 0x1a5b8033c _dispatch_client_callout + 20 16 libdispatch.dylib 0x1a5b8c600 _dispatch_main_queue_callback_4CF + 832 17 CoreFoundation 0x1a5e5b41c CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 16 18 CoreFoundation 0x1a5e56034 __CFRunLoopRun + 1708 19 CoreFoundation 0x1a5e55660 CFRunLoopRunSpecific + 480 20 GraphicsServices 0x1b0266604 GSEventRunModal + 164 21 UIKitCore 0x1aa02a15c UIApplicationMain + 1944 22 HubbleConnected 0x1028d0988 main + 14 (main.m:14) 23 libdyld.dylib 0x1a5cd11ec start + 4
Still we are getting this crash in iOS
I think should avoid doing some potentially risky things in the dealloc method.