ocaml
ocaml copied to clipboard
Modify ocamldebug to handle Dune workspace mapping
In order to produce reproducible builds (independent of the location of the build), starting from Dune 3.0, Dune has mapped references to the workspace directory to “/workspace_root”. The source/load path directories contained in debug information produced by ocamlc were affected by this change, for users who build their projects with Dune. The result is that ocamldebug sometimes is unable to find the program source code.
Starting in Dune 3.7, a dune-project stanza was added to suppress the workspace mapping.
(map_workspace_root <bool>)
This enables users to get programs that ocamldebug can handle, at the expense of not having reproducible builds.
Another workaround is for the ocamldebug user to add -I <dir> options showing where the sources are.
Note that if the project uses bin and lib directories, as suggested by dune init project, then the source file references in the debug information will be things like "bin/main.ml" or "lib/mod.ml". Currently, the emacs "ocamldebug" command that is used
to start ocamldebug, changes to the directory of the bytecode file, e.g. _build/default/bin. From that directory, it will not
be able to find the sources. If, before trying to do anything that causes the program to load, the user uses "cd" to move up to the directory containing "_build", then the relative paths in the debug events will work ok. That is yet another workaround for users.
This feature request proposes an ocamldebug change that is relatively simple and would work most of the time. Since it is Dune that is causing the problem, it will not occur unless the user is using Dune to build their project. In that case, the build artifacts are contained in a "_build" directory (at least by default). ocamldebug can look at the location of the binary being debugged (or of additional binaries loaded dynamically). If they are contained in a "_build" directory, then the occurrences of "/workspace_root" can be mapped to the directory containing the _build directory. That should restore the directories to what they were without the mapping.
I have a version of ocamldebug that has been so modified and plan to submit a PR soon.
I have submitted #12085 to address this issue.
ocamldebug should already support reproducible builds based on the BUILD_PATH_PREFIX_MAP standard (https://reproducible-builds.org/specs/build-path-prefix-map/). Could you try to see if you can make use of that, and report back whether it works or not ?
It certainly doesn't seem like a good idea to add dune-specific support to ocamldebug.
Could you try to see if you can make use of that, and report back whether it works or not ?
I suspect that it won't work. The toolchain doesn't really implement the standard, you can't remap absolute paths, see https://github.com/ocaml/ocaml/issues/8665.
Btw. unless I missed it, the support doesn't seem to be documented anywhere.
Could you try to see if you can make use of that, and report back whether it works or not ?
I suspect that it won't work. The toolchain doesn't really implement the standard, you can't remap absolute paths, see #8665.
My understanding (please correct me if I'm wrong) is that BUILD_PATH_PREFIX_MAP is used to make the toolchain generate "sanitized" paths (eg not containing paths that depend on the local build machine), but here the problem is the inverse one, ie to have ocamldebug read such sanitized paths and somehow find the "actual" files in the local build machine. I don't see how BUILD_PATH_PREFIX_MAP could help in this case (even if it did more rewriting than what it does today).
But here the problem is the inverse one, ie to have
ocamldebugread such sanitized paths and somehow find the "actual" files in the local build machine.
Well you could remap these sanitized paths to actual ones by simply inverting a map specified in BUILD_PATH_PREFIX_PATH.
But here the problem is the inverse one, ie to have
ocamldebugread such sanitized paths and somehow find the "actual" files in the local build machine.Well you could remap these sanitized paths to actual ones by simply inverting a map specified in
BUILD_PATH_PREFIX_PATH.
Yes, but this map is not stored anywhere (storing it in the artifacts would negate the whole point of sanitizing the paths in the first place), so there isn't anything for ocamldebug to invert to begin with, right?
so there isn't anything for
ocamldebugto invert to begin with, right?
You can invoke ocamldebug with an appropriate BUILD_PATH_PREFIX_PATH.
so there isn't anything for
ocamldebugto invert to begin with, right?You can invoke
ocamldebugwith an appropriateBUILD_PATH_PREFIX_PATH.
OK, I see what you mean. Assuming BUILD_PATH_PREFIX_PATH were to be patched to rewrite absolute paths, we would also have to patch ocamldebug to apply this rewriting when looking up source files, if I'm understanding correctly.
Isn't this the most direct fix for this problem?
Assuming
BUILD_PATH_PREFIX_PATHwere to be patched to rewrite absolute paths, we would also have to patchocamldebugto apply this rewriting when looking up source files, if I'm understanding correctly
That's the idea.
I'll be honest and I think you can eschew the rewrite to absolute paths, though I would personally REALLY like to see it happen :-) as the current support is useless for me.
But for this problem I think you'd just need to patch ocamldebug to look up BUILD_PATH_PREFIX_PATH, invert the map and apply it when looking up files.
Isn't this the most direct fix for this problem?
At least it would avoid hard-coding specific build system assumptions and terminology upstream. Your build system, who likely knows the concrete BUILD_PATH_PREFIX_PATH, can then setup an environment for you to launch ocamldebug.
@lthls, @dbuenzli, @nojb, @dra27: Thanks you for your feedback.
I have been thinking this over and doing some more investigation.
Regarding:
| It certainly doesn't seem like a good idea to add dune-specific support to ocamldebug.
Dune now seems to be the de facto way forward for building OCaml projects. And its integration with Merlin and VSCode makes browsing OCaml projects "just work", whereas those that do not use Dune usually require a lot of fiddling with .merlin files. Even now I have trouble browsing the OCaml source code (go-to-definition does not usually work) because it is not built with Dune (though I see there are efforts toward that end). Before the mapping change to Dune, ocamldebug was working "out-of-the-box" for Dune-built programs. I don't think it is unreasonable to make a change such as I'm suggesting, so the users don't have to change their method of invoking ocamldebug.
For a typical Dune build, say of a module in lib/lib1/mod.ml, the compiler is currently producing a directory list like this for the module:
Debug directories=
/workspace_root
/workspace_root/lib/lib1
/workspace_root/lib/lib1/.lib1.objs/byte
and the file part of the locations for an individual debug_event has information like this:
File "lib/lib1/mod.ml".
Without my proposed fix, ocamldebug will be able to find the sources if it is lucky, e.g. if
- The cwd is the workspace root. In this case, it will find the original sources.
- The cwd is a directory like
_build/default, in which caseocamldebugwill find the copy of the source in the build directory. This can be annoying because if using emacs tuareg, it will ask if you want to go to the original copy. - If the user supplied
-I root, whererootis the workspace root. This in effect makes the directories supplied by the compiler unnecessary.
But with my proposed fix, ocamldebug will again be able to find the sources without special user intervention. To me, this definitely seems worthwhile.
The above is correct for debugging a module from its build directory. But what about after the package is installed? If, with Dune, one specified mypkg as the name of the package containing the lib1 library, when installed the sources will end up in a location like: _install/lib/mypkg/lib1/mod.ml, where _install is the specified or default prefix (e.g. usually .opam/someswitch.
We can see that this is a problem, because there is no prefix that we can add to lib/lib1/mod.ml to match _install/lib/mypkg/lib1/mod.ml. We could try to strip off a lib/ from the debug_event path, and then look for that in the installation directories. If the debug information has names and digests of the source files, then perhaps there would be some way to find the source files reliably in the installation (e.g. opam) directory.
So I don't have a solution for finding the sources after the library is installed. Either Dune needs to arrange the installation image to be isomorphic to the original source tree, and then the user could specify -I options for the library modules of interest. Or perhaps there could be an option to try stripping off toplevel-directory from the debug_events. This could even be interactive. E.G. if ocamldebug cannot find the source for a file, it could let the user say where it is.
In any case, the common case is the user debugging their own code. We should definitely support that.
@lthls, @dbuenzli, @nojb, @dra27:
As a follow-on to the preceding, if the library user avoids putting all their libraries under a lib subdirectory, but give each library its own subdirectory with its name, then the source at mylib1/mod1.ml will have mylib1/mod1.ml in the debug_events, and when installed will be at a location like _install/mypkg/mylib1/mod1.ml. ocamldebug would then be able to find the sources if the user supplied -I _install/mypkg to ocamldebug.
So it seems that following such a policy would be good in order to allow the sources of installed libraries to be found.
- If the user supplied
-I root, whererootis the workspace root. This in effect makes the directories supplied by the compiler unnecessary.
If this is enough to make ocamldebug work, I don't understand why we need to change ocamldebug at all. Passing a single -I flag seems pretty simple for the user to do.
If this is enough to make
ocamldebugwork, I don't understand why we need to changeocamldebugat all. Passing a single-Iflag seems pretty simple for the user to do.
@nojb, true, if they know to do it. If my suggested fix is not accepted, then the least we could do is modify ocamldebug to warn the user with what they need to do to get it to work. The warning could be triggered if there is a directory starting with "/workspace_root" and then ocamldebug tries and fails to find a source file. And ocaml and Dune documentation should be augmented to let the user know about this situation and what to do.
My preference is still to make it so the user does not have to do it, as that seems more user-friendly.
I've also been thinking more about finding the sources when a library (or program) is installed. In that case, since Dune knows where things are coming from and where they are going, I think Dune should arrange to re-write the debug information so that it has the absolute paths of the destination of the install, and also resolve things in the debug_events, e.g. mapping lib/lib1/mod.ml to lib1/mod.ml as appropriate. That probably would require augmenting one of the ocaml tools to be able to do such rewriting.
@nojb, true, if they know to do it.
I don't see how passing -I to the debugger is in any way different from passing -I to the compiler. We don't expect the compiler to magically know which directories to add to the include path; similarly I don't see why we should expect the debugger to be any different. This sounds more like a job for Dune itself. Perhaps Dune should have a command dune debug <exe> that would pass the necessary -I flag. Moreover, Dune would know about installed vs local libraries, and could tailor the flags accordingly. The compiler lacks too much information, so it just doesn't feel like the right place at which to fix this issue. Do you agree?
Do you agree?
@nojb, my main problem is that this is a breaking change, so we either need to fix it so the user does not need to change their way of doing things or else we need to decide on a new strategy (such as your suggested "dune debug
Actually your "dune debug
On the other hand, if Dune arranged for installed libraries to have the correct absolute file locations, then something like "dune debug
For now, the Dune 3.7 documentation does state that mapping workspace root breaks debug information, so people can just disable the mapping when they are debugging. Users of Dune 3.0 to 3.6 will see breakage without warning, and just feel frustrated with ocamldebug.
So I guess I'm willing to abandon this issue unless someone else thinks it has enough merit to keep. I'll wait a while before closing.
Personally I still think that having ocamldebug apply the invert map found in an BUILD_PATH_PREFIX_PATH environment variable is a good idea.
This makes the toolchain work consistently with the mechanisms it supports.
Having a dune debug command feels very much in the spirit of Dune to me. I mean, just like Dune already makes it hard to just run a compiled executable manually, but provides dune exec for this purpose.
Having a
dune debugcommand feels very much in the spirit of Dune to me.
@xavierleroy, true. But how would you feed about having dune install arrange to have the debug information rewritten so the paths referred to the absolute path in the install directory, e.g. .opam/<switch>. We could have a tool, e.g. ocamledit, that could copy .cmo or .cma files, editing the debug information and debug directories. Then the executable bytecode would have the correct paths and there would be no need for dune debug. On the other hand, I don't entirely understand what dune exec does that just executing the bytecode file directly does, so perhaps dune debug is needed.
Since you have entered the conversation, I wonder if I could have your opinion on some other questions.
-
How would you feel about having ocamldebug built standalone, using compiler-libs, rather than as part of the ocaml build? I have some ideas for enhancements to ocamldebug, e.g.
- Fix so setting a breakpoint at a function works. Now it only will set a breakpoint if the function is in the local environment. But the debug information has enough information that we can identify the functions in advance. So then the usual method of setting a function before starting would work.
- Provide more information in various places, such as
- An
info functionscommand to show the available functions - Show the stack and heap at the current location, i.e. the variables available in the current scope.
- A command to print the value of all variables available at the current location.
-
How would you feel about having ocamlobjinfo built standalone, using compiler-libs, but also enhanced to be more complete, merging the ability to dump bytecode instructions from dumpobj?
I already have such stand-alone utilities (with the features mentioned above) that I have extracted:
- https://github.com/richardlford/socamldebug (standalone ocamldebug) and
- https://github.com/richardlford/ocamldumper (combines ocamlobjinfo and dumpobj)
One goal of having these be standalone is so that they can be installed at multiple versions of ocaml, using ppx_optcomp to make adjustments as needed. That way as enhancements are added, they can be available to those who are not at the latest ocaml version.
Thanks for your feedback.
Personally I still think that having
ocamldebugapply the invert map found in anBUILD_PATH_PREFIX_PATHenvironment variable is a good idea.This makes the toolchain work consistently with the mechanisms it supports.
@dbuenzli, That would work for the user code, but not for the libraries that had been installed (and assuming the installed libraries also used workspace mapping). One idea I've been thinking about is that the debug directories should be relative with the package name at the top, e.g. mypkg/lib1. Then consumers of such debug directories could prefix with an implicit .opam/<switch>. That way the debug directories for different packages would be different. It also seems that instead of adding all the directories into a single directory list, one should track which modules goes with which directory path. For example, pkg1 and pkg2 could both have a file mod.ml. If the debug directories are lumped together, then mod.ml would be found in the first location, whereas if the directories were associated with modules that would not be a problem. That is a potential problem in the current implementation. I'll see if I can make a test case to reproduce it.
That is a potential problem in the current implementation. I'll see if I can make a test case to reproduce it.
@dbuenzli, I tried to make two packages, ambig1 and ambig2, with the same module, lib1.ml, but which was "private", according to Dune, and that was referenced from the entrypoints of the packages. I thought I might be able to build a test app that used both packages. However, I get a message that ambig1.cmi and ambig2.cmi make inconsistent assumptions over interface Lib1. So it appears it is not possible to have an executable where sources from more than one module with the same simple name. Do you think that is true? If so, then just having a list of directories is sufficient, although it seems that mapping modules to directories would also be correct (and possibly more efficient if the list was long). Do you agree?
@richardlford, I have no idea. dune works hard to make you believe in a world that doesn't exist by doing a lot of (distasteful if you as me) renaming churn with the sources and I'm not familiar with the details.
But one thing for sure is that if you end up in a state where you have two compilation units with the same name you won't be able to link both in the same program.
One idea I've been thinking about is that the debug directories should be relative with the package name at the top, e.g.
mypkg/lib1. Then consumers of such debug directories could prefix with an implicit.opam/<switch>
It doesn't have to be relative, it could be a constant "virtual root" like /ocaml but yes I think it would be better for libraries to compile using a BUILD_PATH_PREFIX_PATH so that the resulting paths in the virtual root mirror the way they are going to be installed in a lib prefix at install time. That way you can easily reroot /ocaml to the the lib prefix of your choice when you get such paths.
See the discussion at https://github.com/ocaml/ocaml/issues/12106#issuecomment-1469766370. The conclusion is that ocamldebug should support BUILD_PATH_PREFIX_PATH as previously suggested. I will take that approach.
This issue has been open one year with no activity. Consequently, it is being marked with the "stale" label. What this means is that the issue will be automatically closed in 30 days unless more comments are added or the "stale" label is removed. Comments that provide new information on the issue are especially welcome: is it still reproducible? did it appear in other contexts? how critical is it? etc.
A very shallow look to this issue leaves me under the impression that it
should be closed because, if there is something missing for Dune to work
well, the solution does not look to be in adding anything to
ocamldebug.
What do others think?
I think the overall objective "Allow ocamldebug to work with Dune workspace mapping" is a reasonable one... i.e. allowing enough generic support in ocamldebug for this. There are also open PRs against this issue. There are various pieces of activity going on with debugging at the moment, so I think we can risk waiting to see if the Stale bot is the next commenter...