libdatadog
libdatadog copied to clipboard
Add `initialize_before_fork` functions
What does this PR do?
Adds initialize_before_fork() functions.
Also re-organizes the includes and mods section a bit to group things a bit more.
Motivation
From one of the function's documentation:
/// On Apple platforms in particular, some things need to be initialized
/// before a fork, and ideally before threads are created.
/// Users may run into errors like the following if they do not:
/// > objc[25938]: +[__NSCFConstantString initialize] may have been in
/// > progress in another thread when fork() was called. We cannot safely
/// > call it or ignore it in the fork() child process. Crashing instead. Set
/// > a breakpoint on objc_initializeAfterForkError to debug.
Additional Notes
If the codebase using libdatadog might fork, then this should be called before forking, at least on Apple platforms.
The profiling::exporter module re-exports the ddcommon::connector's function, and the profiling-ffi::exporter uses the profiling::exporter version. This organization made sense to me at the time but I am having trouble defending it so I'm happy to hear your thoughts.
How to test the change?
Call one of the initialize_before_fork functions on an Apple platform, then fork, and note that you don't crash, whereas on older versions of libdatadog you might (depends potentially on if anything else has fully initialized the Apple frameworks). Note that you don't need to use the return value for this use case, though you could if you were interested in logging failures.
Ahh... This is interesting, since I've seen it as well before: https://github.com/DataDog/libddprof/issues/61 .
I do have a small suggestion. Since the intention of the libdatadog callers is not to load_root_certs (that's an implementation detail for libdatadog), do you think we could expose a safely_initialize_macos_resources() (safely_initialize_before_fork()?) or something similar that would then call into this function?
This way we would abstract what's going on exactly, and could change the details whenever, without having such an internal detail of libdatadog need to be exposed directly.
(Also, it would be really nice if this was exposed via the ffi! :P)
Interestingly, Ruby recently added a workaround that sounds similar: https://github.com/ruby/ruby/commit/e3cc1a6cae0e6c88c04cd54c3afa3c022bb6772c
Indeed, there is a whole class of types out there that can trigger these issues. I don't know the Apple ecosystem well enough to say, "yes, definitely, we've fixed all the things" so I guess we'll be playing whack-a-mole. Exposing it through FFI is important too, good call.
I've updated this to your suggestions, @ivoanjo, and also to resolve a conflict with the main branch 👍🏻
I was able to re-test this with Ruby and I still get crashes in the child process. Here's the relevant part of the stack trace for the crash:
Thread 0 Crashed:: Dispatch queue: com.apple.security.keychain-cache-queue
0 libsystem_kernel.dylib 0x00007fff2049791e __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff204c65bd pthread_kill + 263
2 libsystem_c.dylib 0x00007fff2041b406 abort + 125
3 libruby.2.7.dylib 0x00000001043d5359 die + 9
4 libruby.2.7.dylib 0x00000001043d54f3 rb_bug_for_fatal_signal + 403
5 libruby.2.7.dylib 0x00000001044eac12 sigsegv + 82
6 libsystem_platform.dylib 0x00007fff2050bd7d _sigtramp + 29
7 ??? 0x00007f89ed526190 0 + 140230368846224
8 com.apple.security 0x00007fff225b9b79 invocation function for block in Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) + 76
9 libdispatch.dylib 0x00007fff2031c806 _dispatch_client_callout + 8
10 libdispatch.dylib 0x00007fff203295ce _dispatch_lane_barrier_sync_invoke_and_complete + 60
11 com.apple.security 0x00007fff225b9a31 Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*) + 535
12 com.apple.security 0x00007fff2237c14e Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) + 302
13 com.apple.security 0x00007fff225c663a Security::KeychainCore::TrustSettings::findQualifiedCerts(std::__1::vector<Security::KeychainCore::Keychain, std::__1::allocator<Security::KeychainCore::Keychain> >&, bool, bool, cssm_data const*, char const*, unsigned int, __CFArray*) + 252
14 com.apple.security 0x00007fff225b560c SecTrustSettingsCopyCertificates + 505
15 libdatadog_profiling.dylib 0x0000000106217dae security_framework::trust_settings::TrustSettings::iter::h63e45bed18fb1639 + 25 (trust_settings.rs:104) [inlined]
16 libdatadog_profiling.dylib 0x0000000106217dae rustls_native_certs::macos::load_native_certs::h13356bc16655a82b + 136 (macos.rs:24) [inlined]
17 libdatadog_profiling.dylib 0x0000000106217dae core::ops::function::FnOnce::call_once::hec68719506ba1e77 + 136 (function.rs:248) [inlined]
18 libdatadog_profiling.dylib 0x0000000106217dae core::option::Option$LT$T$GT$::unwrap_or_else::h5e0faed1e3ce7d22 + 136 (option.rs:825) [inlined]
19 libdatadog_profiling.dylib 0x0000000106217dae rustls_native_certs::load_native_certs::h90685e5da29cf340 + 314 (lib.rs:50)
20 libdatadog_profiling.dylib 0x00000001061811c8 ddcommon::connector::load_root_certs::h85c62c2c73d6cf4d + 57 (mod.rs:84)
21 libdatadog_profiling.dylib 0x00000001061235fd ddcommon::connector::build_https_connector::he7e610d954436c28 + 5 (mod.rs:69) [inlined]
22 libdatadog_profiling.dylib 0x00000001061235fd ddcommon::connector::Connector::new::ha077e2da0c75f5e3 + 5 (mod.rs:38) [inlined]
23 libdatadog_profiling.dylib 0x00000001061235fd datadog_profiling::exporter::Exporter::new::h83d1b824a33f1a6f + 110 (mod.rs:254) [inlined]
24 libdatadog_profiling.dylib 0x00000001061235fd datadog_profiling::exporter::ProfileExporter::new::h7270b40f967a9bd3 + 110 (mod.rs:128) [inlined]
25 libdatadog_profiling.dylib 0x00000001061235fd datadog_profiling_ffi::exporter::profile_exporter_new::_$u7b$$u7b$closure$u7d$$u7d$::hbe34ceef863b4739 + 2580 (exporter.rs:150) [inlined]
26 libdatadog_profiling.dylib 0x00000001061235fd ddog_ProfileExporter_new + 4133 (exporter.rs:144)
27 ddtrace_profiling_native_extension.2.7.6_x86_64-darwin20.bundle 0x0000000106115791 create_exporter + 1457 (http_transport.c:102)
28 ddtrace_profiling_native_extension.2.7.6_x86_64-darwin20.bundle 0x0000000106114926 _native_validate_exporter + 70 (http_transport.c:79)
I did make sure to call the new method, and the result is OK. Maybe the issue is that we're still calling load_root_certs in the fork, and perhaps we should just cache that information?
I think in osx you can not use keychains after fork
- man page of fork: https://www.unix.com/man-page/osx/2/fork/
There are limits to what you can do in the child process. To be totally safe you should restrict yourself to only executing async-signal safe operations until such time as one of the exec functions is called. All APIs, including global data symbols, in any framework or library should be assumed to be unsafe after a fork() unless explicitly documented to be safe or async-signal safe. If you need to use these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself.
- The idea would be to drop everything after fork ?
- Similar discussions for python: https://github.com/jaraco/keyring/issues/281
Quoted differently so it's easier to read:
There are limits to what you can do in the child process. To be totally safe you should restrict yourself to only executing async-signal safe operations until such time as one of the exec functions is called. All APIs, including global data symbols, in any framework or library should be assumed to be unsafe after a fork() unless explicitly documented to be safe or async-signal safe. If you need to use these frameworks in the child process, you must exec. In this situation it is reasonable to exec yourself.
I believe this is why Mac wants you to initialize these things before you fork, and detects such issues and crashes otherwise. We should be fine to initialize the things we need, though since we're not experts it may take some time to iron all of them out.
I'm also fine to not support HTTPS on Mac specifically if it's feasible to implement.
Lastly, fork is terrible and I pray for the days that it can be completely deprecated in favor of newer things like posix_spawn or io_uring_spawn.
I think we can close until we go back to implementing macOS support.