mason
mason copied to clipboard
Ensuring consistent compiler version for linux binaries
Mason should be more strict about using consistent compiler versions for all mason binaries.
This ticket describes some context of why this is important, background, proposed ideas for a solving this and related issues.
Context
We've been mixing clang 3.5 and 3.8 binaries for a while now in mason for Linux packages. Old packages are built with 3.5, and new ones published are built with 3.8. This is not okay going forward. Recently an issue arose in mapnik-vector-tile on Linux that indicates rare bugs can surface when clang++ 3.5 and 3.8 binaries are mixed: https://github.com/mapbox/mapnik-vector-tile/issues/216. In this case 1000s of unit tests were passing and only one was failing. But that failing test was only solvable by rebuilding all binaries to consistently use clang 3.8. So the tentative takehome is that mixing clang 3.5 and 3.8 binaries works 99% of the time. That is not good enough. For a binary deployment solution to be solid the binaries need to work 100% of the time.
Background on compiler defaults per platform
linux
Currently we are targeting clang 3.8. But 3.8.1 and 3.9.0 are now out and we should get a plan in place to upgrade to those compilers safely. Per the Context
above we have evidence to suggest that mixing binaries between clang++ versions (at least 3.5 <->3.8) is not safe (see Context
above for more info). At this point we don't have any evidence that mixing binaries between clang (any version) and gcc 5 is problematic (as long as -D_GLIBCXX_USE_CXX11_ABI=0
is set when using gcc5 to ensure linking works).
macOS and iOS
Currently most .travis.yml target osx_image: xcode7.3
. We have no evidence to suggest there is are problems mixing binaries between xcode versions, but ideally we come up with a good mechanism to track the latest stable, which is now osx_image: xcode8
Android
We use the cross compiler so this is already explicit and controlled, no action needed.
Ideas for improving mason
- Add dev mode to mason core to enable easily rebuilding all packages against a new compiler without overwriting any existing binaries. Applications using Mason should be able to opt-in to the freshly compiled binaries when upgrading their mason version.
- For builds: add environment assertion that runs before packages are built that assert:
- CXX and CC environment settings make sense
- CXX --version and CC --version make sense
- Add post-build assertions that test for GLIBC and GLIBCXX requirements and ensure no > than expected
- Vet every .travis.yml for consistency of format and toolchain bootstrap.
/cc @kkaefer @jfirebaugh @jakepruitt @tmpsantos @karenzshea
My next action on this is to branch mason, tweak the PLATFORM_ID
, and experiment safely with rebuilding all packages with clang 3.8 without modifying existing ones. In the process I will update old .travis.yml that are out of date, and report back about ways I can think of keeping things in sync more. So, in short, I'll experiment with some pre-actions for solving this issue and testing new compilers and then report back on actual plans for more feedback.
Noting to myself as I start looking for glitches in packages, one is when install
is overridden like https://github.com/mapbox/mason/blob/7cc64b1f32930d0de99381e21dd2cab9f474fa4f/scripts/nunicode/1.7.1/.travis.yml#L39. In this case clang++-3.8
is not getting installed and the build is falling back to the system clang++-3.4: https://travis-ci.org/mapbox/mason/jobs/161267885#L295
-D_GLIBCXX_USE_CXX11_ABI=0
This flag doesn't work and it is dangerous.
Right now my mapbox-gl-native
is crashing at runtime because we recently upgraded geojson-cpp
to a version that was built with -D_GLIBCXX_USE_CXX11_ABI=1
.
It links and it runs but leaves you with a corrupted binary with a crash that is hard to trigger or debug. For instance the unit tests are passing and we do use go through code paths using geojson-cpp
on the tests. It takes a specific GeoJSON source to trigger the crash.
On Linux, seems to me, the safe bet is to use precompiled binaries built on the target machine or let mason build from the sources.
For the record the crash looks like this if you ever encounter something similar:
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc
@tmpsantos thanks for the details. I'll repeat them back to ensure I've got them right.
- geojson-cpp binary is built on ubuntu precise (https://github.com/mapbox/mason/blob/7cc64b1f32930d0de99381e21dd2cab9f474fa4f/scripts/geojson/0.3.0/.travis.yml#L21-L23) with
-D_GLIBCXX_USE_CXX11_ABI=1
(https://github.com/mapbox/mason/blob/7cc64b1f32930d0de99381e21dd2cab9f474fa4f/scripts/geojson/0.3.0/script.sh#L4) andg++-5
on travis (https://github.com/mapbox/mason/blob/7cc64b1f32930d0de99381e21dd2cab9f474fa4f/scripts/geojson/0.3.0/.travis.yml#L54-L57). - That binary crashes when used with mapbox-gl-native on linux. That mapbox-gl-native is compiled with
-D_GLIBCXX_USE_CXX11_ABI=1
I presume and alsog++-5
? Which version doesg++-5 --version
report exactly?
Noting: will not consider moving to clang 3.9.x until https://github.com/mapbox/variant/issues/121 is solved.
That binary crashes when used with mapbox-gl-native on linux. That mapbox-gl-native is compiled with -D_GLIBCXX_USE_CXX11_ABI=1 I presume and also g++-5? Which version does g++-5 --version report exactly?
@kkaefer how did you build the geojson
0.3.0 package?
https://github.com/mapbox/mason/blob/master/scripts/geojson/0.3.0/script.sh 😛
https://github.com/mapbox/mason/blob/master/scripts/geojson/0.3.0/script.sh
Hehe, but on a VM with Ubuntu Precise + g++-5 like our Travis bots?
https://github.com/mapbox/mason/blob/master/scripts/geojson/0.3.0/.travis.yml#L21-L30 😛
Update here:
- We are now targeting clang++ 3.9.1 for all newly built packages
- There have been no reported problems with mixing binaries across clang++ versions (which we are still technically doing)
- It is still desirable in my mind to rebuild all packages against 3.9.1, but this has not happened yet. Because
travis trigger
rate limits at 10 builds we'd need to build on our own hardware. I've had this on my todo list, but it grows by the day and this is not at the top of the list.
Update again:
- We are now targeting clang++ 5.0.0 for all newly built packages: https://github.com/mapbox/mason/pull/512
- There have still been no obvious problems with mixing binaries across recent clang++ versions (> 3.9). The only know problem was the original one I reported in the description with mapnik-vector-tile.
- The GLICXX issue with g++ still warrants investigation, but feels low priority since mgbl is depending on source header-only libraries from mason primarily now.
Overall my feeling over time has shifted on this issue. I used to think it was critical that we provide a way to recompile all binaries with each new compiler version. While a meaningful goal I think this is out of scope.