bazel
bazel copied to clipboard
Windows, C++: investigate '/experimental:deterministic' compiler flag
Feature requests: what underlying problem are you trying to solve with this feature?
Investigate MSVC's /experimental:deterministic flag's effect, maybe use it in MSVC crosstool. (Maybe put it behind a Bazel experimental flag.)
Motivation: to make C++ builds deterministic. For example, MSVC doesn't support overriding non-deterministic macros like __DATE__, maybe this flag solves that problem.
What operating system are you running Bazel on?
Windows 10
What's the output of bazel info release?
release 0.29.1
/cc @meteorcloudy @hlopko
We have no experience so far. From https://developercommunity.visualstudio.com/idea/426033/support-for-cls-experimentaldeterministic-becoming.html, looks like this is not yet documented anywhere and is still under development?
I found https://developercommunity.visualstudio.com/content/problem/502528/changing-pdb-source-file-paths-with-experimentalde.html which says:
"""PDBs are not deterministic and aren't the intended target of this feature, so there may still be a great many full paths still embedded in the PDBs after doing this. These switches are intended for object file and executable determinism, not PDBs."""
Since executables embed a ref to the PDB in /debug build, it looks like deterministic builds with debug information remain impossible with MSVC.
(Plug: They are possible with clang-cl and lld-link, see http://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html for how.)
Some other maybe interesting links:
- https://developercommunity.visualstudio.com/content/problem/573023/absolute-paths-for-associated-pdb-files-are-record.html
- https://developercommunity.visualstudio.com/content/problem/589675/implicitly-inline-methods-that-required-non-paged.html
I've been playing around with some trial and error and found that:
d1nodatetime: undefs__DATE__,__TIME__,__TIMESTAMP__. user may redefine them.Brepro(should be supplied to all of cl, lib, and link): impliesd1nodatetimeand sets initial value of those#definesequal to 1. Additionally modifies some data stored in object files to make them binary-identical between builds.experimental:deterministicimpliesBreproand issues a warning if problematic code is found (usage of__TIME__, etc.)
Using Brepro, INCREMENTAL:NO, and Z7, I'm able to get outputs to be binary-identical between builds on the same computer, with some exceptions: If a .manifest and/or resource file (.rc) is processed during compilation/linking, it creates a temporary file during processing, and for whatever reason, this temporary path is included in the resulting pdb (obviously the temporary has been deleted by the time the build is done, so...pointless). This can be "solved" by just omitting the .rc or using a side-by-side .manifest instead of embedding it. However, pdbs with these different temporary paths inside are completely interchangeable - the difference does not prevent a debugger loading it for example.
d1trimfile:<path[=ENV_VAR]> (seen in above links) sounds like it was originally intended to trim the __FILE__ definition, which it does seem to do in my testing (handy if you use FC, I guess). As discussed in above links, they seem to have made it apply to paths stored in .obj files referencing a pdb as well, which occurs on non-Z7 builds, i.e. Zi builds. Unfortunately my quick test still shows full pdb path being in the eventual exe file.
However...why is this a problem? The differences in the pdb can be seen to be inconsequential and the exe will be identical apart from the pdb path, which is accounted for by existing tools (like debuggers and symsrv, which dont use a path to index the pdb).
Maybe make a request to msvc team to have d1trimfile also apply to the pdb path stored in exe? Since they recently made a related change perhaps it just needs to be asked for.
(You can make the pdb path in the exe relative with /pdbaltpath:%_PDB%.)
@nico what exactly is `%_PDB%? Where it comes from? I wonder how to I change Bazel MSVC toolchain to use it?
@konste: You put that there literally. See https://docs.microsoft.com/en-us/cpp/build/reference/pdbaltpath-use-alternate-pdb-path?view=msvc-160
I got /pdbaltpath:%_PDB% working in a toolchain file:
flag_set(
actions = all_link_actions,
flag_groups = [
flag_group(
flags = [
"/Brepro",
# Tell the linker to embed only the .pdb filename instead of
# the full path.
# Note that % has a special meaning for bazel so we need to
# double it to pass a literal % to the linker.
"/PDBALTPATH:%%_PDB%%"
],
),
],
),
I still have some trouble fitting /d1trimfile into the toolchain file.
@dgrunwald did you get anywhere with /d1trimfile in the toolchain?
No. As far as I can tell, the only possible workaround is to wrap the compiler with another binary that turns bazel-friendly command line arguments into actual MSVC commandline arguments.
@dgrunwald thanks. I've got this working:
clwrap.bat
@set CLEXE=C:/apps/MVS174/VC/Tools/MSVC/14.34.31933/bin/HostX64/x64/cl.exe
@%CLEXE% /d1trimfile:%cd% %*
The next problem we are facing is that Bazel's use of /Fo
One more flag that can help: https://developercommunity.visualstudio.com/t/The-result-of-FILE-is-not-lower-case/1392988?q=associate+to+test+case
-d1ForceLowerCaseNames
I was experimenting with this recently using MSVC 2022 (compiler version 19.36.32535) and here is what I found.
/experimental:deterministic plus /Brepro given to both compiler and linker achieves the goal of building the primary binary (such as DLL) deterministically, i,e. binaries built on the different machines are identical.
Also /d1trimfile works as expected regardless of /Fo presense which is good. The only catch is the prefix to remove must be specified verbatim, i.e. with backslashes, not forward slashes. The problem: we cannot specify the prefix on the command line as it would immediately make the compilation action non-cacheable unless output_base is set identically everywhere. I am thinking about doing it through the .params file but I am not sure if its content is taking into account for the cache key calculation or not. Also when we set the prefix in the toolchain and let it create .params file the prefix gets escaped (backslashes get doubled) and it is enough for the prefix being not verbatim and /d1trimfile stops working.
Next observation: .obj files for some reason include absolute path to themselves. I don't know why and what what to do with it.
Also PDB files appear to be very different when produced on the different machines.
So there is a progress, but not the final solution unfortunately.
Great, information, thanks for sharing. This sounds very similar behaviour to VS2019. I am working around the d1trimfile issue in two ways:
Approach 1) as mentioned above, write a clwrap.bat which figures out the trim directory and adds it to the command line, call clwrap.bat instead of cl.exe in the toolchain (we take a copy of Bazel's autogenerated toolchain and check it in to our codebase). Approach 2) Write cl.cxx, compile it to cl.exe, and pass it into the toolchain as above.
Approach 1 worked fine and was simple, but confuses coverity's bazel integration. Coverity introspects the build and expects to see something called exactly "cl.exe" invoked. So we are now following Approach 2.
@konste did you try a compile without /Fo? If so, does the absolute path disappear from obj? That is what I see.
BuildXL uses subst to map out the current folder (build/src) into an stable one.
subst B:\ d:\p\code
Then builds from B:\ - and paths are seeing as if coming from there.
One can go even further (maybe) and split - say source code comes from a:, and buid is in b:, etc.
On the other hand C# has /pathmap to remap one or more roots to another stable name, I guess C++ (MSVC) is just missing this....
Approach 1) as mentioned above, write a clwrap.bat which figures out the trim directory and adds it to the command line, call clwrap.bat instead of cl.exe in the toolchain (we take a copy of Bazel's autogenerated toolchain and check it in to our codebase). Approach 2) Write cl.cxx, compile it to cl.exe, and pass it into the toolchain as above.
I understand the idea of the compiler wrapper (cmd or exe), but it does not help because of the full path to object files inside object files. By the way how you handle obj files at execution root when you compile without /Fo? Do you use /Z7 or /Zi? We use /Z7 to avoid Bazel killing mspdbsrv.exe
@konste did you try a compile without /Fo? If so, does the absolute path disappear from obj? That is what I see.
Unfortunately both with and without /Fo I see absolute path in obj :-( It is very weird that we see different results here. Wonder if the difference is in the compiler version or some other options because /Fo does not seem to make a difference for me.
On the other hand C# has /pathmap to remap one or more roots to another stable name, I guess C++ (MSVC) is just missing this....
Interestingly I found pathmap string inside MSVC compiler binaries and when I tried to add it to the command line I got
cl : Command line error D8053 : argument to /pathmap must be of the form STR1=STR2 where STR1 is not empty
which makes me believe that it could be supported to some degree, just badly undocumented anywhere. Do you have a plan how can we use it to make MSVC build cacheable?
I've tried a simple reproducer, and I was using pathmap, and that does seem to be the key: Preconditions: download sysinternals strings and put at d:\bin
set DIR=d:\workdir\deterministic
mkdir %DIR%\out
cd %DIR%
echo #include ^<stdio.h^> >hello.cxx
echo. >> hello.cxx
echo int hello() { >> hello.cxx
echo printf("hello"); >> hello.cxx
echo return 1; >> hello.cxx
echo } >> hello.cxx
cl /experimental:deterministic /pathmap:%DIR%\out=out /c hello.cxx /Fo:%DIR%\out\hello.obj
d:\bin\strings out\hello.obj | grep hello
outputs:
out\hello.obj
hello
?hello@@YAHXZ
$unwind$?hello@@YAHXZ
$pdata$?hello@@YAHXZ
Tested on:
- Microsoft (R) C/C++ Optimizing Compiler Version 19.40.33521 for x86
- Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30153 for x64
Note: I am also using d1trimfile and Brepro, I recall they help in other scenarios.
I was using pathmap, and that does seem to be the key
But we still need compiler wrapper script to set pathmap right?
But we still need compiler wrapper script to set pathmap right?
Yes
Holy... didn't know this worked for C++ too! Thanks for exploring!!!
BTW link.exe also have undocumented /pathmap parameter and it helped with .exp file (not like we need it) but PDB files are still a problem - they have absolute path to execroot embedded anyway. Ideas?
PDB is non-deterministic for us too. My hope is that as it is a leaf artifact this will not matter. Do you see it as being a practical issue in your deployment?