MergeFromFail() leaks full path into binary
I noticed using the strings tool that my full path to the proto .cc files was being leaked into the binary even while compiling in release and stripping all debug data.
After debugging a bit I believe that the problem comes from the function static void MergeFromFail(int line) whose implementation is:
static void MergeFromFail(int line) { ::google::protobuf::internal::MergeFromFail(__FILE__, line); }
What does that function do anyway? Can I replace the __FILE__ macro with something else without breaking everything? I don't want my username and project paths to leak into my binaries.
Thanks in advance.
Reading the code it seems that MergeFromFail just does print an error and exit(1) so I guess I will just sed over the generated code and replace the macro with something else, however I really think that the compiler should write the filename there instead of using the __FILE__ macro. If I have some time I'll try to submit a patch.
Fortunately this problem will be going away soon. We removed MergeFromFail in Google's internal fork as part of a performance optimization and we will soon merge our internal changes into master. You can see those changes on my branch here (https://github.com/acozzette/protobuf/tree/down-integrate); as soon as I get a couple things straightened out I should be able to merge that.
Thanks! However I'm starting to think that the leak is not only in there because I removed the __FILE__ macros from my generated source files and recompiled and the path still shows up in the binary. I'm sure it's only in the protobuf part because it shows up only in protobuf's .o files...
After playing a bit with the source code (master branch, I haven't tried your branch yet) I found that the macro is all over the place in the protobuf headers themselves (not only the generated code) so unless those get deleted, the full path is going to leak into the binaries.
However, if I redefine the __FILE__ macro at compile time, which I should not because it's undefined behaviour [c99, 6.10.8p4], then the binaries come out clean, so the only proper option I see to remove the macro itself from protobuf's header files.
I suggest that for example if the NDEBUG macro is defined, the error and log messages suppress the use of __FILE__ and __LINE__.
At the moment, every use of GOOGLE_LOG()/GOOGLE_CHECK() will bring in FILE and LINE. It might be feasible to define GOOGLE_LOG()/GOOGLE_CHECK() in a way that in release build they don't use FILE/LINE, but it's hard to guarantee we don't introduce FILE again in the future.
g++ seems to define __FILE__ so that it matches the file name you passed to it. So if you pass "file.cc" then __FILE__ is just "file.cc" but if you pass "/absolute/path/file.cc" then __FILE__ is "/absolute/path/file.cc". Fortunately we pass only the relative paths during the build and so __FILE__ doesn't leak the name of your home directory or anything. Here's a log statement for example that shows only the relative path:
[libprotobuf ERROR google/protobuf/descriptor.cc:3422] Invalid proto descriptor for file "foo.proto":
@nake90, in that case is this good enough for your use case?
@acozzette, I'm using cmake and it passes full paths to g++ so I can't easily change that. There is a setting to change it but it seems to work only on small, simple, projects [1]. I will try to see if I can manage to use that but I feel it is more a workaround than fixing the problem itself, and it will give trouble to other people with more complex projects.
EDIT: I actually can't use that. It seems a quite broken part of cmake and it is really recommended to be avoided. I tried it in my project and it did not fix the issue.
@xfxyjwf, what about adding one test that does something like strip a test binary compiled in release mode and then grep the output of the strings command? That way the tests would fail if a full path is used.
Thanks for all the help.
[1] https://cmake.org/cmake/help/v3.0/variable/CMAKE_USE_RELATIVE_PATHS.html
With C++11 we could use constexpr functions to strip the leading directories at compile time. I think this will be hard to fix until we have C++11 though. The log messages are intended to contain the filenames; this is useful.
Perhaps if you want to avoid this you could compile with make instead of cmake? I believe make will not pass absolute paths, so it shouldn't leak the enclosing directory.
If this isn't possible, perhaps you could compile your release build from /tmp? It's not ideal, but it might be your best option for now.
I ended up doing a horribly bad hack in cmake to fix this:
set_source_files_properties(${SOURCE_FILES} PROPERTIES COMPILE_FLAGS "${CXXFLAGS} -s -Wno-builtin-macro-redefined -U__FILE__ -D__FILE__='\"$(subst ${CMAKE_SOURCE_DIR}/,,$(abspath $<))\"'")
Basically: tell gcc to ignore builtin macro redefinitions and redefine FILE with a local path instead of the global one.
Hope that this helps someone, but I am hoping that we can get this done in the near future in a more-correct way. The constexpr idea seems good but if you have files with the same name in different folders you would lose the local path. However I think it is good enough for most situations.
I keep thinking that we can't possibly be the only project that has had this problem; I would be interested to hear about how other projects handle it and whether there is some established best practice. Deterministic builds seem to be becoming more popular these days and they must have some kind of solution to this. Or maybe it's just a CMake limitation and other build systems don't really have to worry about this.
Back when I was searching how to fix this (if my memory is correct), people were using small hacks and such for cmake because the cmake devs say that it would be very difficult to implement that directly on their side. Also, traditional makefile projects don't have this problem because they can control how the compiler gets called so that's not an issue for them.
Regarding best practices... don't know about other people but I usually remove the debug information like filenames and paths if the macro NDEBUG is set. If the macro is not set, then my error and logs have the filename on it. If the macro is set, then the logs and errors only contain the message itself and maybe the function name.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please add a comment.
This issue is labeled inactive because the last activity was over 90 days ago.
We triage inactive PRs and issues in order to make it easier to find active work. If this issue should remain active or becomes active again, please reopen it.
This issue was closed and archived because there has been no new activity in the 14 days since the inactive label was added.