MQTT-Client-Framework icon indicating copy to clipboard operation
MQTT-Client-Framework copied to clipboard

-[MQTTDecoder stream:handleEvent:] crashing with EXC_BAD_ACCESS KERN_INVALID_ADDRESS

Open repomies opened this issue 7 years ago • 29 comments

Over the past days I've seen a few crashes like this for my app in Crashlytics. Any ideas what might be going wrong? Using MQTTClient 0.9.2.

Crashed: com.apple.main-thread
0  libobjc.A.dylib                0x18b3fc150 objc_msgSend + 16
1  MQTTClient                     0x1009f62b8 -[MQTTDecoder stream:handleEvent:] (MQTTDecoder.m:132)
2  CoreFoundation                 0x18c8d201c _signalEventSync + 212
3  CoreFoundation                 0x18c8d1f28 _cfstream_shared_signalEventSync + 440
4  CoreFoundation                 0x18c93542c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
5  CoreFoundation                 0x18c934d9c __CFRunLoopDoSources0 + 540
6  CoreFoundation                 0x18c9329a8 __CFRunLoopRun + 744
7  CoreFoundation                 0x18c862da4 CFRunLoopRunSpecific + 424
8  GraphicsServices               0x18e2cc074 GSEventRunModal + 100
9  UIKit                          0x192b1d058 UIApplicationMain + 208
10 MyApp                          0x1001970e8 main (AppDelegate.swift:20)
11 libdyld.dylib                  0x18b87159c start + 4

repomies avatar Jun 18 '17 09:06 repomies

The one time so far I've managed to reproduce this on an Xcode debug, I got the following as the tail of the console log. I added one more debug print on line 130 of MQTTDecoder.m to print the dataBuffer as a UTF-8 string (prefixed with [MQTTDecoder] data: in the log), and looks like the problematic data was cting.. My guess is the backend is doing something seriously wrong here, but it'd be nice if the iOS client didn't crash on it.

2017-07-02 13:53:16.261999+0300 MyApp[61535:16904187] [MQTTDecoder] received (54)=<31343938 39393237 38383a20 536f636b 65742065 72726f72 206f6e20 636c6965 6e74203c 756e6b6e 6f776e3e 2c206469 73636f6e 6e65>...
2017-07-02 13:53:16.262015+0300 MyApp[61535:16904187] [MQTTDecoder] data: 1498992788: Socket error on client <unknown>, disconne
2017-07-02 13:53:16.262034+0300 MyApp[61535:16904187] [MQTTSession] other state
2017-07-02 13:53:16.262048+0300 MyApp[61535:16904187] [MQTTDecoder] NSStreamEventHasBytesAvailable
2017-07-02 13:53:16.262257+0300 MyApp[61535:16904187] [MQTTDecoder] fixedHeader=0x63
2017-07-02 13:53:16.262724+0300 MyApp[61535:16904187] [MQTTDecoder] digit=0x74 0x74 0 1
2017-07-02 13:53:16.262829+0300 MyApp[61535:16904187] [MQTTDecoder] remainingLength=116
2017-07-02 13:53:16.262862+0300 MyApp[61535:16904187] [MQTTDecoder] read 116 116
2017-07-02 13:53:16.262975+0300 MyApp[61535:16904187] [MQTTDecoder] received (118)=<6374696e 672e0a31 34393839 39323738 383a204e 65772063 6f6e6e65 6374696f 6e206672 6f6d2031 37382e35 352e312e 3138206f 6e20706f 72742031 3838332e 0a313439 38393932 3738383a 204e6577 20636c69 656e7420 636f6e6e 65637465 64206672 6f6d2031 37382e35 352e312e 31382061 7320>...
2017-07-02 13:53:16.263031+0300 MyApp[61535:16904187] [MQTTDecoder] data: cting.
1498992788: New connection from 178.55.1.18 on port 1883.
1498992788: New client connected from 178.55.1.18 as
2017-07-02 13:53:16.263055+0300 MyApp[61535:16904187] [MQTTMessage] unexpected payload after packet identifier
2017-07-02 13:53:16.263221+0300 MyApp[61535:16904187] [MQTTSession] MQTT illegal message received
2017-07-02 13:53:16.263238+0300 MyApp[61535:16904187] [MQTTSession] closeInternal
2017-07-02 13:53:16.263308+0300 MyApp[61535:16904187] [MQTTCFSocketTransport] close

repomies avatar Jul 02 '17 11:07 repomies

I have met the same issue, any update?

ithaqua avatar Jul 06 '17 03:07 ithaqua

Same issue, how is this going?

pablogeek avatar Jul 07 '17 07:07 pablogeek

the cting. is just the continuation of the data sent before ...disconne

What broker are you using? It seems the broker is sending mosquitto debug messages to the mqtt connection. This is not MQTT protocol.

The protocol violation is detected. The crash is probably caused by data already received which shouldn't be processed anymore after detecting the protocol violation

ckrey avatar Jul 07 '17 08:07 ckrey

Hi @ckrey and sorry for the late reply — the backend is using Mosquitto (https://github.com/toke/docker-mosquitto) as far as I know. I will try to find out more once the backend developers are back from vacations :)

Strangely this bug peaked in the analytics at the latter half of June, then was mostly gone for most of July, until another peak over the very last few days. App usage has been quite constant over this period so it's a bit puzzling what exactly is causing it.

repomies avatar Aug 01 '17 10:08 repomies

I have same issue on my side. Also i use Mosquito broker on other side. Do you have update or advice what to do?

rockyftn avatar Aug 09 '17 21:08 rockyftn

I might have figured out why this happened. I called close() on my MQTTSession right before it was deallocated, which somehow led to MQTTDecoder getting deallocated right after it called decoder:didReceiveMessage: and crashing on next line's self.dataBuffer = nil;. Waiting for closeWithDisconnectHandler to return the callback before deiniting the session seems to have fixed this as far as I can tell.

repomies avatar Aug 17 '17 21:08 repomies

Still saw the same crash at least once (though only on a non-debug build so far) despite the fix mentioned above. But it seems to happen much more rarely now, if anything.

repomies avatar Aug 21 '17 09:08 repomies

Update: The "fix" I mentioned above didn't end up solving the problem, and if anything we actually saw the crashes peak over the past week or two. The root cause of the problem definitely seems to be some garbage (Mosquitto debug logs) being pushed into the pipeline by the backend (unfortunately I can't do anything about that). I tried a couple of other iOS MQTT client libraries and they actually dealt way worse with this problem, crashing right away when connecting, or never managing to parse any data.

However, it seems – and this still needs more data points to make sure but at 40% adoption and zero crashes so far, it sure looks promising – that my latest update got rid of the crashes simply by holding onto each MQTTSession instance for 30 more seconds after it disconnects before allowing it to be deallocated. The garbage in the data pipeline remains, but this way it doesn't seem to lead to the memory problems. (I'll post another update after I've seen whether the crashes truly are gone now.)

repomies avatar Sep 03 '17 11:09 repomies

Yes, this time the crashes definitely seem to be gone for good. Still zero crashes reported for the newest app version after five days (we used to have at least a dozen every day).

repomies avatar Sep 06 '17 21:09 repomies

I have the same crash in Fabric. Any updates on this? @repomies

pablogeek avatar Sep 19 '17 21:09 pablogeek

@pablogeek yes, looks like in my case it completely disappeared after I deployed the fix described above – just holding onto the MQTTSession instance for extra 30 seconds after disconnecting it (and probably that's much longer than needed) so that the MQTTDecoder instance doesn't end up deallocated when it's still trying to process an invalid message.

repomies avatar Sep 28 '17 17:09 repomies

@repomies I am also facing the same issue. How do you hold the session after disconnection? A sample code would help. Thanks.

sharonSilicus avatar Oct 26 '17 06:10 sharonSilicus

Can you describe what you are doing with MQTT? Are you subscribing to a topic with a lot of large retained messages and QoS0, which causes the broker to flood the client with data? Or do you use QoS1/2 but do not have specified a reasonable max_inflight_messages.

I do not want to say the library is error-free, but under "normal" circumstances it does behave quite stable

ckrey avatar Oct 26 '17 16:10 ckrey

@ckrey While the application is moving to background, I close the connection to the MQTT server its an Azure IoT Hub. While coming back to foreground, I try to connect to MQTT server again. Once in a couple of days the crash appears.

- (void)applicationDidEnterBackground:(UIApplication *)application {
    // Use this method to release shared resources, save user data, invalidate timers, and store enough application state information to restore your application to its current state in case it is terminated later.
//     If your application supports background execution, this method is called instead of applicationWillTerminate: when the user quits.
    MQTTSession *session = [[MQTTSessionManager sharedManager] session];
    [session closeAndWait:1];
}
- (void)applicationDidBecomeActive:(UIApplication *)application {
    // Restart any tasks that were paused (or not yet started) while the application was inactive. If the application was previously in the background, optionally refresh the user interface.
    
    MQTTSession *session = [[MQTTSessionManager sharedManager] session];

    if ([session status] == MQTTSessionStatusClosed ) {
     
        NSString *clientID = [ApplicationManager sharedManager].clientID;
        NSString *sas = [ApplicationManager sharedManager].sasKey;
        NSString *hostURL = [ApplicationManager sharedManager].hostURL;
        
        [[MQTTSessionManager sharedManager] setHost:hostURL];
        [[MQTTSessionManager sharedManager] setClientID:clientID];
        
        [[MQTTSessionManager sharedManager] connectWithUserName:[NSString stringWithFormat:@"%@/%@",hostURL,clientID] andPassword:sas];
    }
    
}

sharonSilicus avatar Oct 27 '17 04:10 sharonSilicus

Relevant probably: https://github.com/facebook/SocketRocket/issues/532

jcavar avatar Oct 31 '17 10:10 jcavar

I believe this bug is fixed on master: https://github.com/novastone-media/MQTT-Client-Framework/blob/master/MQTTClient/MQTTClient/MQTTCFSocketTransport.m#L39-L46

We had to move to using queues instead of run loop which is probably better anyway. There are some bits to resolve before making official release, but feel free to try it out from master.

jcavar avatar Nov 14 '17 14:11 jcavar

Ok, 0.10.0 version is on CocoaPods and this issue should be fixed I believe. Please give it a try.

jcavar avatar Dec 14 '17 07:12 jcavar

Using MQTTClient 0.12.0, still many crashes like this for my app, [MQTTDecoder stream:handleEvent:] @jcavar

saberChen avatar May 14 '18 02:05 saberChen

Do you have any idea how it happens?

jcavar avatar May 14 '18 07:05 jcavar

I think the crash is related to a synchronization issue in close method (MQTTDecoder.m:88) leading to a deadlock. I believe most of the clients experience it on the main thread as it's the default configuration.

 if (self.queue != dispatch_get_specific(&QueueIdentityKey)) {
        dispatch_sync(self.queue, ^{ // deadlock
            [self internalClose];
        });
    }

Existing implementation seems to relay on the assumption that two different queues cannot perform blocks on the same thread. This assumption is not correct as GCD may skip switching threads when it's applicable (for performance reasons).

Here is a test that should reproduce it.

#import <XCTest/XCTest.h>
#import "MQTTDecoder.h"

@interface MQTTDecoderTests : XCTestCase

@end

@implementation MQTTDecoderTests

- (void)testCloseFromNonMainDispachQueueOnMainThread {
    MQTTDecoder *sut = [[MQTTDecoder alloc] init];
    dispatch_queue_t customQueue = dispatch_queue_create("mqttdecodertests.queue", nil);

    dispatch_sync(customQueue, ^{
        [sut close];
    });

    XCTAssertTrue(YES); // test passed if this line is reached
}

@end

The fix is not trivial as the MQTTDecoder using externally available queue. My naive fix would at least make some cases working, and throw exceptions if the code cannot be performed synchronously in a safe way.

- (void)close {
    BOOL dispatchedOnDecoderQueue = self.queue == dispatch_get_specific(&QueueIdentityKey);
    BOOL calledOnMainThreadAndDecoderUsingMainQueue = [NSThread isMainThread] && self.queue == dispatch_get_main_queue();
    BOOL canSafelyInlineInternalClose = dispatchedOnDecoderQueue || calledOnMainThreadAndDecoderUsingMainQueue;
    if (!canSafelyInlineInternalClose) {
        @throw [NSException exceptionWithName:NSInternalInconsistencyException
                                       reason:@"Cannot close MQTTDecoder synchronously"
                                     userInfo:nil];
    }

    [self internalClose];
}

Maybe we could improve the fix by replacing the exception with a code niling the delegate and dispatching internalClose asynchronously? I am not sure if that would have any bad implications.

marciniwanicki avatar Aug 22 '18 18:08 marciniwanicki

This is still happen because QueueIdentityKey = NULL.

I have fixed by this way, and it works.

@interface MQTTDecoder() {
    void *QueueIdentityKey;
}

@property (nonatomic) NSMutableArray<NSInputStream *> *streams;
@property (readonly) void *QueueIdentityKey;

@end
- (void *)QueueIdentityKey {
    return QueueIdentityKey;
}
- (void)setQueue:(dispatch_queue_t)queue {
    _queue = queue;
    
    // We're going to use dispatch_queue_set_specific() to "mark" our queue.
    // The dispatch_queue_set_specific() and dispatch_get_specific() functions take a "void *key" parameter.
    // Later we can use dispatch_get_specific() to determine if we're executing on our queue.
    // From the documentation:
    //
    // > Keys are only compared as pointers and are never dereferenced.
    // > Thus, you can use a pointer to a static variable for a specific subsystem or
    // > any other value that allows you to identify the value uniquely.
    //
    // So we're just going to use the memory address of an ivar.
    
    QueueIdentityKey = &QueueIdentityKey;
    dispatch_queue_set_specific(_queue, QueueIdentityKey, QueueIdentityKey, NULL);
}
- (void)close {
    // https://github.com/novastone-media/MQTT-Client-Framework/issues/325
    dispatch_block_t block = ^{
        [self internalClose];
    };
    
    if (dispatch_get_specific(QueueIdentityKey))
        block();
    else
        dispatch_sync(_queue, block);
}

nnmanhit avatar Aug 30 '18 10:08 nnmanhit

We merged this a while ago I think. Is this issue still happening?

jcavar avatar May 31 '19 14:05 jcavar

I’m sorry I can’t answer for myself, since our app’s backend infrastructure unfortunately no longer exists.

repomies avatar May 31 '19 16:05 repomies

This issue also happened in my application lately, using version 0.15.2

avichayVonage avatar Jun 12 '19 16:06 avichayVonage

Can you provide more information about how it happens (logs, conditions, steps)?

jcavar avatar Jun 22 '19 09:06 jcavar

https://github.com/novastone-media/MQTT-Client-Framework/issues/526

jcavar avatar Oct 15 '19 13:10 jcavar

Crashed: com.apple.main-thread 0 libsystem_kernel.dylib 0x1a5cc7db8 close + 8 1 CoreFoundation 0x1a5e6671c CFSocketInvalidate + 820 2 CFNetwork 0x1a92734ec _CFStreamErrorFromCFError + 158756 3 CoreFoundation 0x1a5e708cc _CFStreamClose + 108 4 MQTTClient 0x105c7e134 -[MQTTCFSocketDecoder close] + 60 5 MQTTClient 0x105c7f6d4 -[MQTTCFSocketTransport internalClose] + 244 6 MQTTClient 0x105c8c718 -[MQTTSession closeInternal] + 332 7 MQTTClient 0x105c91130 -[MQTTSession error:error:] + 224 8 MQTTClient 0x105c90e80 -[MQTTSession connectionError:] + 60 9 MQTTClient 0x105c7f880 -[MQTTCFSocketTransport encoder:didFailWithError:] + 104 10 MQTTClient 0x105c7e90c -[MQTTCFSocketEncoder stream:handleEvent:] + 508 11 MQTTClient 0x105c988f8 -[MQTTSSLSecurityPolicyEncoder stream:handleEvent:] + 124 12 CoreFoundation 0x1a5e746ac _signalEventSync + 216 13 CoreFoundation 0x1a5e747b8 ___signalEventQueue_block_invoke + 28 14 libdispatch.dylib 0x1a5b7eec4 _dispatch_call_block_and_release + 32 15 libdispatch.dylib 0x1a5b8033c _dispatch_client_callout + 20 16 libdispatch.dylib 0x1a5b8c600 _dispatch_main_queue_callback_4CF + 832 17 CoreFoundation 0x1a5e5b41c CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE + 16 18 CoreFoundation 0x1a5e56034 __CFRunLoopRun + 1708 19 CoreFoundation 0x1a5e55660 CFRunLoopRunSpecific + 480 20 GraphicsServices 0x1b0266604 GSEventRunModal + 164 21 UIKitCore 0x1aa02a15c UIApplicationMain + 1944 22 HubbleConnected 0x1028d0988 main + 14 (main.m:14) 23 libdyld.dylib 0x1a5cd11ec start + 4

Still we are getting this crash in iOS

shashidj avatar Jun 12 '20 17:06 shashidj

I think should avoid doing some potentially risky things in the dealloc method.

pantaopt avatar Aug 06 '20 06:08 pantaopt