grpc-swift icon indicating copy to clipboard operation
grpc-swift copied to clipboard

Cgrpc keeps crashing randomly

Open m-bacevicius opened this issue 4 years ago • 15 comments

Problem

Application keeps crashing randomly.

Version

Using Swift-Grpc v.0.11.0

Thread 21 Crashed:
0   libsystem_kernel.dylib          0x30c541d88         __pthread_kill
1   libsystem_pthread.dylib         0x18c0ef74c         <redacted>
2   libsystem_c.dylib               0x18c03e934         abort
3   grpc                            0x1063a3480         gpr_malloc (alloc.cc:59)
4   SwiftGRPC                       0x1058d1bd4         cgrpc_metadata_array_copy (metadata_shim.c:83)
5   SwiftGRPC                       0x1058d3f34         Call.start (Call.swift:120)
6   SwiftGRPC                       0x1058e1340         ClientCallUnaryBase.start (ClientCallUnary.swift:49)
7   SwiftGRPC                       0x1058e0eb4         ClientCallUnaryBase.run (ClientCallUnary.swift:30)
8   MyApp                           0x204a03628         [inlined] Chat_AccountServiceServiceClient.requestUpdateToken (MyApp.grpc.swift:906)
9   MyApp                           0x204a03628         [inlined] Chat_AccountServiceServiceClient (<compiler-generated>:904)
10  MyApp                           0x204a03628         [inlined] Chat_AccountServiceService.requestUpdateToken (MyApp.grpc.swift:666)

Any help to avoid this crash would be much appreciated.

m-bacevicius avatar Jun 19 '20 12:06 m-bacevicius

This crash is happening when a call to malloc is returning NULL. This is either going to be an initialisation issue or a system state issue. Assuming this is crashing during the runtime of the program, it seems likely this is related to system state. Where are you running this program, and what's the state of the system?

Lukasa avatar Jun 22 '20 06:06 Lukasa

The program is run on iPhone with iOS from 13.3 to 13.5.1. Can you elaborate with do you have in mind with the state of the system?

m-bacevicius avatar Jun 22 '20 07:06 m-bacevicius

So in general malloc should not return NULL for non-zero-sized allocations. cgrpc certainly doesn't expect it to. So we need to try to investigate why it's returning NULL.

As there are no calls to gpr_set_allocation_functions that I can see in the code, it doesn't seem like we're shimming out to something else. So the question becomes, what is going on with the app when these allocations fail? Is it using excessive amounts of memory? Is it backgrounded? Without knowing this it's hard to know.

Lukasa avatar Jun 22 '20 09:06 Lukasa

Application does not use excessive amounts of memory. The issue is the most apparent when application is being started. I also have to note that crashes have been seen in the middle of application use (not when application is being started, killed or moved to background).

m-bacevicius avatar Jun 22 '20 10:06 m-bacevicius

Are you able to reproduce this crash yourself?

Lukasa avatar Jun 22 '20 10:06 Lukasa

Not constantly. It's totally random as far as I'm aware. I can add more crash logs if that would help.

m-bacevicius avatar Jun 22 '20 10:06 m-bacevicius

I think if you are at least able to sometimes reproduce this crash it would be good to try to run your app with AddressSanitizer enabled to see if it reports anything.

Lukasa avatar Jun 22 '20 11:06 Lukasa

I'll try, but I doubt that I'll be able to reproduce the crash deliberately.

m-bacevicius avatar Jun 22 '20 11:06 m-bacevicius

Sorry I'm not able to reproduce the bug with debugger attached. Have you got any news from your side?

m-bacevicius avatar Jun 26 '20 10:06 m-bacevicius

@spookytime what about Thread Sanitizer or Address Sanitizer?

weissi avatar Jun 26 '20 10:06 weissi

@weissi haven't seen no errors from Address Sanitizer.

m-bacevicius avatar Jun 26 '20 10:06 m-bacevicius

@spookytime hmm, maybe try Thread Sanitizer?

weissi avatar Jun 26 '20 10:06 weissi

@weissi Will do!

m-bacevicius avatar Jun 26 '20 11:06 m-bacevicius

Got some output from Sanitizer:


Pods/gRPC-Core/src/core/lib/gprpp/inlined_vector.h:129:28: runtime error: constructor call on misaligned address 0x000165d5ae78 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData', which requires 64 byte alignment
0x000165d5ae78: note: pointer points here
 01 00 00 00  be be be be be be be be  be be be be be be be be  be be be be be be be be  be be be be
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/gprpp/inlined_vector.h:129:28 in 
Pods/gRPC-Core/src/core/lib/channel/channelz.h:133:5: runtime error: constructor call on misaligned address 0x000165d5ae78 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData *', which requires 64 byte alignment
0x000165d5ae78: note: pointer points here
 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/channel/channelz.h:133:5 in 
Pods/gRPC-Core/src/core/lib/channel/channelz.h:133:5: runtime error: constructor call on misaligned address 0x000165d5ae78 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData *', which requires 64 byte alignment
0x000165d5ae78: note: pointer points here
 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/channel/channelz.h:133:5 in 
Pods/gRPC-Core/src/core/lib/gprpp/inlined_vector.h:92:12: runtime error: reference binding to misaligned address 0x000165d5ae78 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData', which requires 64 byte alignment
0x000165d5ae78: note: pointer points here
 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/gprpp/inlined_vector.h:92:12 in 
Pods/gRPC-Core/src/core/lib/channel/channelz.cc:118:7: runtime error: reference binding to misaligned address 0x000165d5ae78 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData', which requires 64 byte alignment
0x000165d5ae78: note: pointer points here
 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/channel/channelz.cc:118:7 in 
Pods/gRPC-Core/src/core/lib/channel/channelz.cc:130:3: runtime error: member access within misaligned address 0x000166650fd0 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData', which requires 64 byte alignment
0x000166650fd0: note: pointer points here
 01 00 00 00  01 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 16 e7 69
              ^ 
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior Pods/gRPC-Core/src/core/lib/channel/channelz.cc:130:3 in 

Pods/gRPC-Core/src/core/lib/gprpp/inlined_vector.h:200:18: runtime error: reference binding to misaligned address 0x000165d92678 for type 'grpc_core::channelz::CallCountingHelper::AtomicCounterData', which requires 64 byte alignment
0x000165d92678: note: pointer points here
 01 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00

m-bacevicius avatar Jun 29 '20 08:06 m-bacevicius

This is a UBSan issue. It's not clear that it is directly causing this behaviour, though undefined behaviour is viral so it could be.

Maybe this is a duplicate of grpc/grpc#21466. If it is, that's not terribly encouraging as that issue is closed without resolution.

Lukasa avatar Jun 29 '20 11:06 Lukasa