stronglink
stronglink copied to clipboard
Reproducible builds
We should do what we can to support reproducible builds. However I think it's more practical to wait for better tooling than to start dumping compilers and other tools into the Git repository.
Right now I think LibreSSL is the biggest offender. #92
I assume that you have looked at Gitian (the reproducible build system that Bitcoin Core uses). You typically use a system such as the depends system that Bitcoin Core uses (located in the depends directory in the source tree) to download and build dependencies from source, before proceeding with the build of the actual software using those dependencies that were built. So you shouldn't need to dump tools directly into this repository. Rather you would just have makefiles, for instance, that download and build the dependencies.
You mention compilers. I don't know whether you are thinking about it in terms of including a particular compiler so that everyone is using the same compiler or if you mean that people would need to build the compiler from source. In the former case, with Gitian, that is unnecessary, because people would be using a virtual machine of a particular release of Ubuntu and they would use the compiler included with that release of Ubuntu. In the latter case, I would consider whether any advantages that there may be to compiling the compiler from source are great enough to overcome the issues arising from a much longer build time as a result of compiling the compiler. It is admirable to want to mitigate the Trusting Trust Attack by using Diverse Double Compilation (if that is what you intend), but the reality is that if no one builds a program reproducibly because it takes too long to do so, there is no advantage to making that program reproducible.
I think it is great that you are considering reproducible builds. I also agree that the tooling can be improved, but I think the need for improvement has a lot to do with the limited use of reproducible builds. Let me know if you have any questions about reproducible builds and I will try to help as best I can.
Thanks for the comment!
Let me start by saying I know very little about reproducible builds. I've read a bit about NixOS and saw the news about Debian. I hadn't heard of Gitian but I've started looking into it.
One of the reasons I am interested in reproducible builds and the biggest reason I think they are important for StrongLink right now (when the whole project is so limited and immature) is because bitwise reproduction is also important for content addressing systems. For example, if you want to import an arbitrary web page into StrongLink, there is probably dynamic/random content on that page that should not affect its identity (e.g. comments, ads). So there needs to be a way to do that and there is probably a lot to learn from reproducible build systems. (I have some ideas there.)
For a while I was interested in what I called semantic hash algorithms, which would strip out irrelevant information before passing the data to a regular (i.e. cryptographic) hash function. I ended up deciding that approach was probably bad though.
Honestly I was planning on just packaging for Debian and then waiting to get included in their reproducible builds project. Gitian sounds pretty easy to set up though (although I haven't looked in depth yet). Here is a question though: how valuable are reproducible builds if they are not even for the user's system (e.g. OS X or Windows) or architecture? Doesn't building in a VM limit the applicability? Perhaps this is answered in the Gitian docs...
Diverse double compiling would be nice, but given the other security problems of this project and its relatively low value as a target (compared to, say, Bitcoin), I don't think it can be a priority right now.
Obviously the VM can run other operating systems, including the same as the host. Never mind that question!
Getting a package into Debian and having it be made reproducible through their reproducible builds project sounds like a good idea. Their toolchain sits in a special repository and consists of modified versions of packages such as dpkg and debhelper. I think the goal is to ultimately get these changes to the toolchain integrated into the official Debian repositories. But right now, you are able to add this additional repository to a Debian-based distro and use the normal process of packaging software for Debian. A lot of the issues that are being discovered by Debian that need to be solved to make a build reproducible are relevant even if you are building for OS X or Windows. But the Debian reproducible build project itself makes some changes to the tools used to build a deb package. So it doesn't fully solve OS X or Windows reproducible builds.
Gitian is pretty easy to setup up. It just uses Ubuntu VMs to do the building, so you need to cross-compile for OS X or Windows. In what is called a Gitian descriptor, you specify the version of Ubuntu to use and some other data, including a script for Gitian to run in the VM. That script does the building and copies the output to the output directory. You end up with the output and a file that says the filenames of the inputs and outputs, along with their hashes and it also lists the packages installed in the VM. Here is the Linux Gitian descriptor for Bitcoin Core. As you can see, it uses libfaketime and the --mtime
option to tar, among other things to make the build reproducible. Once the descriptor is written, then you get people to build the software and publish the hashes they get.
So with Gitian, the "hard part" is that you would have to write the descriptor and ideally have some way of building the dependencies also (maybe using the Bitcoin Core depends system). The process of actually using Gitian to build once the descriptor is made is easy.
Right now we vendor all of our non-base dependencies (see the deps
dir) and statically link them. We build them basically through recursive Make. It sounds like that might actually be an advantage for using Gitian, as long as the dependencies themselves build cleanly in that environment.
I'm not a build system expert by any means and I haven't tried out native cross-compiling yet, so that might be a long ways off.
For now I will download Gitian and give it a go. Thanks for your advice!