kotlinx.coroutines
kotlinx.coroutines copied to clipboard
Get rid of Thread.name augmentation with CoroutineId when debug mode …
…is enabled
See #3677 and #2234 for the rationale
Fixes #2234
@dkhalanskyjb WDYT? Do you have any objections or comments?
This PR is draft because it has a bunch of tests failing, mostly guide-related once where coroutine id is part of the output. I also deliberately avoid adding a feature flag to revert the behaviour -- that way we'll see if anybody actually ever relied on this behaviour and will be able to make much more weighted decision (instead of the maintenance of such flag nearly forever)
- When I google
"coroutinename", most of the top results do mention/request naming threads. Plenty of tutorials and internal lore exist around the existing behavior of the debug mode. - This behavior seems to be also used for debugging: https://grep.app/search?q=Log%5C.d.%2AThread%5C.current®exp=true&filter[lang][0]=Kotlin
- The way I read the issue about the ART crash, it seems to happen in production but not during debugging, so it should not interfere with the purpose of the debug mode.
- Performance is typically expected to suffer a bit during debugging.
I don't understand why the debug mode is enabled in production, and from the discussion in #2234, our users don't, too. Something somewhere introduces the debug mode behavior, most of which is typically undesirable in production, not just this part. That's the actual root of the issue, the way I see it. As or Thread.name, yes, if we look at it as a feature for production builds, it's really problematic, but as debug functionality, I think it could be nice.
One thing worries me here. If we had ideal working coroutine debugging tools, we would not need Thread.name augmentation feature at all. But we don't. In particular, if I run the code with coroutines with under debugger and just press "pause", I don't see any coroutine information in the debugger. Currently, the augmented coroutine name would be my only way to figure out what's going on. Would not it be safe to leave it under a flag?
I don't understand why the debug mode is enabled in production
I tend to think completely the opposite. We should eliminate as many barriers as possible for folks to keep debug mode enabled (the same way a lot of folks have -ea in production) -- performance, non-trivial behaviours, and cumbersome scenarios.
The reason it is not enabled by default is straightforward -- it changes the semantics of the application: thread names, exception identities, slightly slowing down the performance etc. and it is not acceptable as the default for a sufficiently large framework.
Yet it would be nice to eliminate as many barriers as possible for an average user to have better diagnostics, observability and additional info as possible; coroutines are already hard to debug, and that's the reason we have requests like #3611 and DebugProbes improvements (that are enabled by default in all IDEA products).
For me, it seems like changing thread's name is not carrying its own weight and I'm yet to find the use of this behaviour for an actual app in the wild.
Roman has a good observation about the actual debugger though, it would be nice to keep this behaviour unless IDEA's debugger automatically augments thread's names in its tooltips. Taking into account current state of IDE team priorities, It's unlikely to happen in an observable timeframe though
Debugging is when you dive into code to find a bug. At that point, you disable compiler optimizations (where applicable), enable the (often stripped) debug symbols like function names, and don't care even if your code runs twice as slow as it does in production, as you want all the help you can get. For example, it's not inconceivable to record the complete execution of a program when debugging to obtain a consistent reproducer (https://rr-project.org/), or to instrument memory reads and writes (https://valgrind.org/, https://github.com/google/sanitizers/wiki/AddressSanitizer). In production, however, a ≈50x slowdown is often out of the question.
Clearly, diagnostics are often useful in production, despite their cost. After all, there's the whole sprawling telemetry industry, and its role is not limited to just building advertisement profiles, and telemetry is often heavier than, say, stacktrace recovery. The idea behind https://github.com/Kotlin/kotlinx.coroutines/issues/3677 also seems solid to me: yes, what we have now for debugging can also be useful for diagnostics. However, there's a spectrum between "everything runs as fast as it can, but we get zero visibility into what's going on" and "everything is instrumented, the full logs are stored, and even the slightest anomalies are detected and reported," and my argument is making it feasible to enable debug mode in production, essentially jamming debugging and observing into a single point of that spectrum, is not a worthy goal.
Good point on telemetry tools. Our goal should be providing easy-to-use APIs for telemetry tools and working with the tool vendors so that they add proper support. Changing the thread name is just a cheap hack to get this kind of integration "for free" as in "zero new APIs", but, as it turns out, it is not free at all in terms of runtime cost.