ldc icon indicating copy to clipboard operation
ldc copied to clipboard

Replace atos backtrace generation with a dSYM loader.

Open LunaTheFoxgirl opened this issue 8 months ago • 41 comments

This pull-request replaces the atos based backtrace generation with a small dSYM loader that maps the default dSYM location of a executable into the address space of the application.

This loader looks for a .dSYM bundle in the executable's directory, maps it into memory and passes it on to the DWARF state machine. It will not, however, map any dSYMs of loaded shared libraries currently; this is planned to be addressed in a future PR. Symbolicating OS libraries and symbolication by UUID are non-goals.

Outdated Info

Using atos is not portable, given that:
1. It's a developer tool that may or may not exist on a given system
2. Said tool relies on being able to execute external applications at runtime; something the hardened runtime heavily restricts
3. Apple does not support this usecase.

As an alternative a weak `rt_dwarfSymbolicate` symbol has been added, allowing dub libraries to implement this functionality instead so that the developer can opt-in to this non-portable behaviour.

Down the line I can investigate making a naïve dSYM lookup system, but without OS help -- that Apple will reject on AppStore review -- full coverage will be unlikely.

LunaTheFoxgirl avatar May 15 '25 16:05 LunaTheFoxgirl

Feedback that I gave on Discord, replicating here as requested.

The reasons for using dladdr and not using atos ext. should be documented as a comment in the code, so future people know why it was done this way, and when to trigger a replacement.

rikkimax avatar May 15 '25 16:05 rikkimax

Needs a cleanup function in case you need to allocate and free.

Do we need to have a function that processes the entire array at once? Why not a function that works on a single stack frame?

schveiguy avatar May 15 '25 17:05 schveiguy

Do we need to have a function that processes the entire array at once? Why not a function that works on a single stack frame?

That gives greatest flexibility and reduces the need for some state/cache somewhere. The implementation will most likely have to read files etc., so processing the whole batch at once makes sense.

Edit: See e.g. the default implementation, processing the debug_line section until all addresses have been resolved: https://github.com/ldc-developers/ldc/blob/53ffdf11c9770a2b43ad36372122959bb3ce5997/runtime/druntime/src/core/internal/backtrace/dwarf.d#L405-L512

kinke avatar May 15 '25 17:05 kinke

I did not test this, but it looks like a regression to me (no more line info). I'm OK with making the atos thing opt-in, but I don't understand why it must be removed. Basically again we are left with backtraces without line info? Having to resort to a (yet non-existing) external implementations is very bad imo.

JohanEngelen avatar May 15 '25 21:05 JohanEngelen

dladdr has been removed and a cleanup hook added. Also moved them be run before and after the rest of the symbol resolution.

LunaTheFoxgirl avatar May 16 '25 16:05 LunaTheFoxgirl

I did not test this, but it looks like a regression to me (no more line info). I'm OK with making the atos thing opt-in, but I don't understand why it must be removed. Basically again we are left with backtraces without line info? Having to resort to a (yet non-existing) external implementations is very bad imo.

As said in #4895, having good stack traces is important, but those stack traces should not come at the cost of it being practically infeasible to use D to release stuff on the AppStore. At this point D is already capable of being used for that, and at least in my business it's a part of my future strategy to release i(Pad)OS versions of my software.

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps; and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

LunaTheFoxgirl avatar May 16 '25 16:05 LunaTheFoxgirl

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps;

What is that extra friction? Do people put debug builds in the app store?

and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

This is not a valid argument to make things worse for macOS.

How about providing a separate small library with rt_dwarfSymbolicate overrides that call into atos, so that people can at least have the old (better ;)) behavior by adding -lsomelib to the compiler invoke? (something similar to the ldc_rt.dso.obj lib on windows)

JohanEngelen avatar May 16 '25 16:05 JohanEngelen

While yes, we could make the runtime only compile in atos stuff in debug mode, that does also mean there's extra friction for developers using DLang to develop apps;

What is that extra friction? Do people put debug builds in the app store?

and also that it's likely to be useless anyways on Darwin derived OSes that don't have atos.

This is not a valid argument to make things worse for macOS.

How about providing a separate small library with rt_dwarfSymbolicate overrides that call into atos, so that people can at least have the old (better ;)) behavior by adding -lsomelib to the compiler invoke? (something similar to the ldc_rt.dso.obj lib on windows)

This is not only about app store, it also affects things such as making builds with the hardened runtime. Which can end up making your life a headache if you want to sign and notarize your apps as well (including for macOS). Over all, I don't agree that this is the correct approach here.

And yeah, some may want to put release builds with debug info, for example, onto the app store. and also may not know to pass in extra LDC specific flags to link to the non-debug version of druntime, etc. While those can be somewhat solved by making dub more intelligent on the matter, it's just overall a massive ugly hack and your application freezing for a couple of seconds while atos runs every time you generate a stack trace is a little excessive.

Also if you only care about macOS, just use lldb, it'll do the symbolication for you when something crashes. It comes with the dev tools.

LunaTheFoxgirl avatar May 16 '25 16:05 LunaTheFoxgirl

Also as a side-note, it's the default behaviour to generate debug symbols for releases with i(Pad)OS apps and the like, xcode does it for you on compile when using swift or Objective-C; so it is over-all expected that you will in fact have the debug symbols there even if it's technically in release mode. So if we for example, just detected whether the auto-generated dSYMs were there (which apple also forcefully does for you once you publish an app) instead of tying it to release-debug; then atos would still possibly be run. Additionally having strings in druntime relating to atos might be enough to trigger a review rejection.

LunaTheFoxgirl avatar May 16 '25 16:05 LunaTheFoxgirl

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

I'm OK with making D easier to release on the app store, and I'm hoping there's a way we can make it easy to do (it's OK to require some extra effort for this, with documentation). But it would be bad for the first experience with D to be unreadable stack traces.

schveiguy avatar May 16 '25 17:05 schveiguy

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

I'm OK with making D easier to release on the app store, and I'm hoping there's a way we can make it easy to do (it's OK to require some extra effort for this, with documentation). But it would be bad for the first experience with D to be unreadable stack traces.

As I said in the other thread. Besides providing these hooks. Making a new utility that embeds dSYMs into the dwarf section might be the way to go. Means LDC avoids ugly hacks in its source tree that just makes it more difficult for existing D users to get their work done. My main goal right now is to ensure we get the ugly hacks rooted out that breaks production software for businesses like mine or Auburn Sounds who rely on D and LDC to make a living.

Once that's done it'll be easier to take a wholistic look at how best to approach the shortcomings removing these hacks create, through better tooling or writing custom implementations where needed.

LunaTheFoxgirl avatar May 16 '25 17:05 LunaTheFoxgirl

My opinion is that the easy experience for D should have stack traces with file/line. You need them when you are learning D, so that is what should be the default.

Yeah my main concern is exactly that - the default experience on a dev box at least should be stack traces with resolved source Locs. Ideally with an acceptable runtime overhead etc., but that's secondary.

So a solution that depends on libatos (dynamically, i.e., copes with it not being available) would be fine to me as well - we depend on poorly documented stuff on Darwin already, that whole TLS disaster with macOS 15.4 was caused by Apple removing an API in macOS 10.15, and Jacob having to re-implement those finicky details in upstream druntime again. This stuff breaking was just a question of time I guess, and might happen again anytime.

kinke avatar May 16 '25 17:05 kinke

One option here might be to add a dedicated object file that is linked automatically for executables on Posix. Pass a switch and it won't do this.

That object file can contain the atos stuff, giving the desired default.

However, this is unnecessary if dlopen approach can ship.

rikkimax avatar May 16 '25 17:05 rikkimax

One option here might be to add a dedicated object file that is linked automatically for executables on Posix. Pass a switch and it won't do this.

That object file can contain the atos stuff, giving the desired default.

However, this is unnecessary if dlopen approach can ship.

I'm not sure it can, Apple also scans strings. They'd easily find out that you are trying to load atos or that you are at some point in the application's life cycle.

If you try to hide this kind of stuff from them, they might revoke your dev and signing access.

LunaTheFoxgirl avatar May 16 '25 17:05 LunaTheFoxgirl

On a second thought. Im currently down with a bit of a cold; I think I should have a harder think about what kind of tooling could be made to make things work out once I'm feeling better. So do hold off on merging this.

LunaTheFoxgirl avatar May 16 '25 18:05 LunaTheFoxgirl

Note that compiler-rt itself, in the sanitizer runtime, also shells out to atos on macOS for symbolizing. Has this caused any concrete issues with D projects being submitted to the App Store? Without any receipts to that end, I'm not sure whether there is something to fix here (though of course the implementation might have issues; #4895).

Using private frameworks directy is another issue, but we are not doing that. Of course, a more elegant solution could always be nice, but might be quite a bit of engineering effort.

dnadlinger avatar May 16 '25 19:05 dnadlinger

Note that compiler-rt itself, in the sanitizer runtime, also shells out to atos on macOS for symbolizing. Has this caused any concrete issues with D projects being submitted to the App Store? Without any receipts to that end, I'm not sure whether there is something to fix here (though of course the implementation might have issues; #4895).

Using private frameworks directy is another issue, but we are not doing that. Of course, a more elegant solution could always be nice, but might be quite a bit of engineering effort.

compiler-rt and the ASAN are not compiled in when you make release versions, so there it's irrelevant. The problem is that private APIs shouldn't make their way into production software; while you're developing and testing locally with self signed certs, it doesn't matter as much, even if you may end up with your application crashing due to using these things with no backtrace. At least you know what you're getting into there.

LunaTheFoxgirl avatar May 22 '25 15:05 LunaTheFoxgirl

v1.41.0 final won't take too long anymore, I wanna release it in the next 2 weeks or so.

How about making the existing resolveAddressesWithAtos() weak (incl. renaming + extern(C)), and allowing the user to override it with a one-line empty dummy function to disable atos? [I guess that could be wrapped in a dub sourceLibrary project - the only requirement is that the object file with the custom strong symbol is linked.]

kinke avatar Jun 03 '25 13:06 kinke

I'll get back to this soonish; have some other stuff happening which means I need to focus on some other things. Instead of going the atos route I'm thinking of adding so that a dSYM file is attempted to be located and loaded instead; since dSYM files are just mach-o files that only contains DWARF sections.

LunaTheFoxgirl avatar Jun 08 '25 02:06 LunaTheFoxgirl

I've rewritten this PR to now include a subsystem that can mmap dSYM files when the line sections aren't in the file. This does mean dsymutil will need to be run to get backtraces.

If someone more experienced with the DWARF format could give me some hints on how to make this properly work, that'd be great. The file is read and I get no crashes, but so far it seems I don't get any line info.

LunaTheFoxgirl avatar Jul 06 '25 14:07 LunaTheFoxgirl

Update: I got it working!

luna@feixiao n_test % cat test.d
import std.stdio;
import core.sys.darwin.dlfcn;

void main() {
	myFunction();
}


void myFunction() {
	throw new Exception("Test");
}
luna@feixiao n_test % rm test test.o; ldc2-dev -g test.d; dsymutil test; ./test
[email protected](10): Test
----------------
test.d:10 void test.myFunction() [0x10057c993]
test.d:5 _Dmain [0x10057c8fb]

LunaTheFoxgirl avatar Jul 06 '25 14:07 LunaTheFoxgirl

Perhaps we could get some tests?

jacob-carlborg avatar Jul 07 '25 05:07 jacob-carlborg

Perhaps we could get some tests?

I want to, but I have no idea where to slot in dsymutil in the makefiles in the runtime tests. They're quite confusingly set up.

LunaTheFoxgirl avatar Jul 07 '25 08:07 LunaTheFoxgirl

Perhaps we could get some tests?

I want to, but I have no idea where to slot in dsymutil in the makefiles in the runtime tests. They're quite confusingly set up.

The closest example I can think of is https://github.com/ldc-developers/ldc/blob/46bbe8b47f7a69aff651e913d80dd846e1d6f613/runtime/druntime/test/exceptions/Makefile#L125-L141

If it helps you your example can be roughly translated as:

diff --git a/runtime/druntime/test/dsymutil/Makefile b/runtime/druntime/test/dsymutil/Makefile
new file mode 100644
index 0000000000..9059819c2b
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/Makefile
@@ -0,0 +1,49 @@
+ifdef IN_LDC
+# Include this for OS
+include ../../../../dmd/osmodel.mak
+endif
+
+ifeq ($(OS),osx)
+# Only define the tests on macos
+endif
+TESTS := my_example
+
+
+# common.mak picks up TESTS and defines default build rules
+include ../common.mak
+
+# Adds -g to all built executables
+$(OBJDIR)/%$(DOTEXE): private extra_dflags += -g
+
+# Tell make how to produce the .dSYM from an object file
+$(OBJDIR)/%$(DOTEXE).dSYM: $(OBJDIR)/%$(DOTEXE)
+	dsymutil $<
+
+# Defines how to run the tests, they require the executable to be build and dsymutil to have been run.
+# And force use this rule instead of the one created by common.mak
+$(TESTS:%=$(OBJDIR)/%.done): $(OBJDIR)/%.done: $(OBJDIR)/%$(DOTEXE) $(OBJDIR)/%$(DOTEXE).dSYM
+	@echo Testing $*
+
+# Run the program and capture stderr
+# Print an error or something if the program didn't fail
+	if $(TIMELIMIT)$< 2>$(OBJDIR)/$*.stderr; then \
+		echo "Program completed unexpectedly. It should have failed" ; \
+		exit 1 ; \
+	fi
+
+# Check the stderr for a pattern
+	if ! grep -q "^object.Exception@src/my_example.d(10): Test$$" $(OBJDIR)/$*.stderr; then \
+		echo "The stderr is not alright" ; \
+		cat $(OBJDIR)/$*.stderr ; \
+		exit 1 ; \
+	fi
+
+# Or check for an exact file match, using sed to remove unportable address and other stuff
+# If you do this add `%.expected` to the target for make to properly track the dependency
+	if ! sed \
+		"s|^.*/src/|src/|g; s/\[0x[0-9a-f]*\]/\[ADDR\]/g; s/scope //g; s/Nl//g" $(OBJDIR)/$*.stderr \
+		| diff -q $*.expected -; then \
+		echo "The stderr did not match $*.expected exactly" ; \
+		cat $(OBJDIR)/$*.stderr ; \
+		exit 1 ; \
+	fi
diff --git a/runtime/druntime/test/dsymutil/my_example.expected b/runtime/druntime/test/dsymutil/my_example.expected
new file mode 100644
index 0000000000..acc96ac820
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/my_example.expected
@@ -0,0 +1,4 @@
+object.Exception@src/my_example.d(10): Test
+----------------
+src/my_example.d:10 void test.myFunction() [ADDR]
+src/my_example.d:5 _Dmain [ADDR]
\ No newline at end of file
diff --git a/runtime/druntime/test/dsymutil/src/my_example.d b/runtime/druntime/test/dsymutil/src/my_example.d
new file mode 100644
index 0000000000..30e8600295
--- /dev/null
+++ b/runtime/druntime/test/dsymutil/src/my_example.d
@@ -0,0 +1,11 @@
+import std.stdio;
+import core.sys.darwin.dlfcn;
+
+void main() {
+	myFunction();
+}
+
+
+void myFunction() {
+	throw new Exception("Test");
+}

I'm not on a mac so I couldn't test it at all but you can try using this template.

the-horo avatar Jul 07 '25 16:07 the-horo

Tried adapting the suggestion to the exception tests; seems it just broke them instead.

LunaTheFoxgirl avatar Jul 07 '25 18:07 LunaTheFoxgirl

Can you try:

$(OBJDIR)/%$(DOTEXE).dSYM: $(OBJDIR)/%$(DOTEXE)
	dsymutil $<

ifeq ($(OS),osx)
tests_without_exe = line_trace_21656 rt_trap_exceptions_drt_gdb
exes = $(filter-out $(tests_without_exe),$(TESTS))
$(exes:%=$(OBJDIR)/%.done): $(OBJDIR)/%.done: $(OBJDIR)/%$(DOTEXE) $(OBJDIR)/%$(DOTEXE).dSYM
$(OBJDIR)/line_trace_21656.done: $(OBJDIR)/line_trace$(DOTEXE).dSYM
$(OBJDIR)/rt_trap_exceptions_drt_gdb.done: $(OBJDIR)/rt_trap_exceptions_drt$(DOTEXE).dSYM
endif

edit: ifeq ($(OS),osx) instead of ifeq ($(OS),"osx")

the-horo avatar Jul 07 '25 18:07 the-horo

worked for me: https://github.com/the-horo/ldc/actions/runs/16126084384

the-horo avatar Jul 08 '25 04:07 the-horo

ok the failing tests there are unrelated to the PR and related to dynamic compile

LunaTheFoxgirl avatar Jul 08 '25 10:07 LunaTheFoxgirl

Are there any plans to support reading the debug info from object files as well?

jacob-carlborg avatar Jul 08 '25 14:07 jacob-carlborg

Are there any plans to support reading the debug info from object files as well?

This would be far more involved work. For now I think just using dSYM is the most sensible way. Especially if we add support in dub for automating this step.

LunaTheFoxgirl avatar Jul 08 '25 14:07 LunaTheFoxgirl