securedrop icon indicating copy to clipboard operation
securedrop copied to clipboard

Independently repeatable fully deterministic compilation

Open tildelowengrimm opened this issue 10 years ago • 29 comments

It should be possible for independent folks to compile SecureDrop from source and achieve exactly the same binary. Once this is possible, SecureDrop's normal release process should rely on multiple independent builders.

For added safetly, the normal release process should also expect that others are secretly building SecureDrop the same way, and provide a mechanism for issuing alerts the official builds are incorrect. This mechanism should be tested regularly (but not often). In a test, all the official builders collude to make an innocuous change (perhaps the addition of whitespace), and publish their build. Then you see how long it takes for someone to sound the alarm.

tildelowengrimm avatar Feb 19 '14 21:02 tildelowengrimm

Note that SecureDrop is not compiled, it is a Python web application (at the moment, this may change somewhat for 1.0, but it probably still won't be natively compiled). What constitutes a test of determinism in that case? It seems like we would need to compare directory layouts, possibly of the whole system, and compare file hashes (similar to file integrity checkers like Tripwire) to have a solid guarantee that the no component of the system has been compromised.

garrettr avatar Feb 19 '14 22:02 garrettr

you could take a hash of the output of find . -type f -print0 | LC_ALL=C sort -z | xargs -0 cat. The output should be byte-for-byte identical if the directory structure and contents of files are identical.

diracdeltas avatar Feb 19 '14 22:02 diracdeltas

Note that if you have non-ASCII filenames, this will fail when you do the comparison across filesystems that have different Unicode normalization forms (http://en.wikipedia.org/wiki/Unicode_equivalence#Normalization). I ran into this when trying to make reproducible builds of HTTPS Everywhere between OSX and GNU/Linux systems; solved it by converting to NFD before sorting files.

diracdeltas avatar Feb 19 '14 23:02 diracdeltas

Well there's the .pyc byte code. I guess the hash thing could be akin to telling Rootkit Hunter to check the SecureDrop path for modified files.

ageis avatar Feb 20 '14 09:02 ageis

We don't ship any .pyc byte code. If people want to check the integrity of their securedrop install at runtime, it'd be slightly tricky because we would have to ignore the randomly generated keys anyway. But I think Tom's issue here is about defending against malicious tampering with the SecureDrop code before it reaches the end user, not about defending the end user against compromise of their machines.

diracdeltas avatar Feb 20 '14 19:02 diracdeltas

If literally everything is interpreted rather than compiled, could this proposal be completely implemented through the use of gpg-signed git tags for releases?

tildelowengrimm avatar Feb 20 '14 19:02 tildelowengrimm

I think this is useful in addition to checking signed git tags.

  1. In case the signing key is compromised.
  2. GPG key management is hard; people might get tricked into importing a fake key into their keyring.
  3. Say that in the future, we want people to be able to use SecureDrop by downloading a release as a .zip file instead of having to use git clone.

diracdeltas avatar Feb 20 '14 19:02 diracdeltas

Could this proposal be completely implemented through the use of gpg-signed git tags for releases?

Possibly. We already do this (last release's signed tag). There are some compiled components of SecureDrop - for example, some of the Python dependencies compiled their own shared libraries (I know scrypt does this). So we might need to take a mixed approach.

It also seems like a large part of this is dependent on the rest of the system being verifiable. What does it matter if the web application is good, if the web server binary is compromised?

garrettr avatar Feb 20 '14 23:02 garrettr

Say that in the future, we want people to be able to use SecureDrop by downloading a release as a .zip file instead of having to use git clone.

We already do this. There's a signed tar.gz on the Freedom of Press's SecureDrop site.

garrettr avatar Feb 20 '14 23:02 garrettr

It seems like the goal is to be able to compare the software that goes into production with an publicly auditable copy (like this Github repo). So maybe what we need to do is develop a SecureDrop .deb package, and have an automated process for comparing downloads of that package to what can be built from the latest release on Github.

Beyond that, we'll have to rely on other projects (Debian, Tor, etc.) to get the rest of our dependencies in a verifiable state. We already host some binaries ourselves (e.g. OSSEC) to reduce the risk of tampering by third parties. I'm not sure if it would be better or worse to host copies of all of our dependencies.

garrettr avatar Feb 20 '14 23:02 garrettr

I think it's definitely a bad idea to hose all the dependencies. It's completely reasonable to rely on the integrity of dependencies (and to choose dependencies based on their perceived integrity). SecureDrop should focus on making SecureDrop verifiable. Otherwise, where does it end. Should Securedrop be in the business of verifying the microcode updates on my BIOS and CPU?

That said, if SecureDrop has a particular target platform, it might be reasonable for the installation procedure first to check the integrity of the platform against various known sorts of problems. Obviously, if the platform is well-and-truly pre-owned that won't work, but it might protect against misconfiguration, and improve SecureDrop's ability to detect future attacks or compromise.

tildelowengrimm avatar Feb 20 '14 23:02 tildelowengrimm

SecureDrop targets 64-bit Ubuntu 12.04 so the same packages should always work. I think we should trust the integrity of official repositories. I wonder if it would be possible to develop a .deb that accomplishes everything that production_installation.sh does... sounds like a fun task.

ageis avatar Feb 21 '14 11:02 ageis

@garrettr wrote:

We already do this. There's a signed tar.gz on the Freedom of Press's SecureDrop site.

Let's make that tar.gz file deterministic as well as signed (in case the machine that is doing the packaging is compromised with malware that modifies local files before signing).

I played around with timestamp modification for a while and couldn't get deterministic archives with GNU tar; can we just use zip instead? https://github.com/devrandom/gitian-builder/blob/master/bin/canon-zip

(The size difference between tar czf and zip -9 is not too bad; 23.4M vs 23.5M)

diracdeltas avatar Mar 20 '14 06:03 diracdeltas

can we just use zip instead?

Sure!

garrettr avatar Mar 25 '14 04:03 garrettr

I had no idea that Tar wasn't deterministic!

tildelowengrimm avatar Mar 25 '14 05:03 tildelowengrimm

gzip's default behavior is to include the timestamp of the uncompressed file, which for tarballs is the timestamp of the intermediate tar archive. That means that your tarball is including the timestamp of the build run, which should explain why your bytes are different on each build even if the source is unchanged. Try this instead if you still want to use tar+gz:

tar cf foobar.tar foobar/
gzip -n foobar.tar

also see http://superuser.com/questions/705877/compressing-compressed-tar-gz-files-deterministically

dpkp avatar Apr 12 '15 19:04 dpkp

So this post was from a while ago and obviously things have changed since this was last being discussed. Whatever was the past, it makes sense now to work on making our .deb packages reproducible. Everything else is already deterministic in the sense that our repository contains no executable binaries. Here is Debian's guide to doing so: https://wiki.debian.org/ReproducibleBuilds/Howto. Since we have automated the process of building the packages we ship by using a 'build' VM provisioned by Ansible, it should be easy to standardize package versions for our toolchain and make the other changes necessary so that anyone can build SecureDrop packages reproducible from the comfort of their beds.

psivesely avatar May 14 '16 04:05 psivesely

Also, note at some point down the road we would like to migrate to Debian. This may not be for years due to the massive migration and support costs, but perhaps sometime before April 2019, when the current Ubuntu LTS hits EOL. The amd64 testing repository of Debian (i.e., Debian 9/Stretch) is already at 89.5% reproducibility (https://tests.reproducible-builds.org/reproducible.html). Thus, a down-to-the-kernel reproducible SD app and mon server is possible in the coming years. I could also say we should go further and recommend the ASUS KGPE-D16 for which we could reproducibly build a binary blobless (i.e., no Intel ME, FSP, VBIOS, or CPU microcode updates) version of coreboot called libreboot for a down-to-the-boot-firmware reproducible system (but I won't go there, I'll just link it for fun and interest https://libreboot.org/docs/hcl/kgpe-d16.html).

psivesely avatar May 14 '16 04:05 psivesely

Okay, so I really want to figure out how to do this and am going to make it happen by the 0.4 milestone. I've been making some small steps towards getting this working, and wheel has recently been patched to support reproducible .whl builds https://bitbucket.org/pypa/wheel/pull-requests/52/apply-the-debian-patch-for-reproducible/diff. From some tests and analysis with diffoscope (which I don't really know what I'm doing with yet), there still seem to be a number of things we'll need to change to get reproducible builds working.

psivesely avatar May 20 '16 09:05 psivesely

Re-evaluated whether this is a good use of time to work on in the coming months and decided it is not. SD might never reach a 0.4 release. We are in the early planning phase of SD 1.0, which will be a huge rearchitecture of the system, and even the application code itself may be rewritten in another language.

I still think that reproducibility is important for both stability and security, so will not close this issue or un-assign myself. Rather, I think this issue should just be left open and re-addressed some many months from now when 1.0 is out.

psivesely avatar Jun 13 '16 23:06 psivesely

Looping back on this, as I just finished developing a deterministic build environment for another project (take a look if you're curious) and have become familiar with Gitian. I think the same tools can be adapted for SecureDrop. For those that aren't familiar with how it works, you have a list of inputs (dependencies, OS packages, SecureDrop source code) which are all hashed. You then do a build and you get some outputs, e.g. the SecureDrop .deb packages, which are also hashed. You sign the resulting "manifest" and push it to a public GitHub repository which contains all of the builder's signatures for each release. The builders compare their manifests, and if there is a difference, you know there is some indeterminism or a modification somewhere.

What you need:

  • A build environment. I use Vagrant VM which runs an LXC guest, using a container to do the actual build.
  • A Gitian descriptor. A recipe or build script which accomplishes the build while removing sources of irreproducibility i.e. timestamps, using faketime. Example: Bitcoin, Tor Browser
  • A helper script that wraps around gitian-builder for kicking off builds, signing and committing manifests, and verifying.

The question is, what are your outputs that you want to build deterministically? I assume it is the SecureDrop application code .debs. The best way to find out if there are any determinism issues in your package or the Python dependencies is to just dive in. This is not that hard, what you will require is a new Vagrant VM for Gitian building, plus an Ansible role for provisioning it. We could do this in a new branch here, or in a separate source code repository.

ageis avatar Oct 16 '16 22:10 ageis

@conorsch @garrettr maybe a regression on this topic is affecting the latest release?

i see that current release 0.3.10 is including pyc files.

i was just fixing this on on GlobaLeaks in relation to debian guidelines to perform compileall in the postinst script and identified this change between 0.3.9 and 0.3.10.

evilaliv3 avatar Dec 09 '16 22:12 evilaliv3

@evilaliv3 The .pyc files shouldn't have made it into the deb package at all. Prior to making headway on deterministic builds, we need spend some time improving the linting around the current package building logic to conform to best practices. Then we'll be in a much better position to seriously attempt deterministic building.

#1464 is an attempt to streamline package building in the staging environment, as a first step toward more attention on refinement of the build process.

conorsch avatar Dec 09 '16 22:12 conorsch

https://github.com/freedomofpress/securedrop/issues/1472 should also simplify deterministic builds.

psivesely avatar Dec 10 '16 00:12 psivesely

Just following up here, the *.pyc files in 0.3.10 was definitely a regression—looking back at older deb packages:

Found .pyc files in securedrop-app-code-0.3-amd64.deb
Found .pyc files in securedrop-app-code-0.3.1-amd64.deb
Found .pyc files in securedrop-app-code-0.3.2-amd64.deb
Found .pyc files in securedrop-app-code-0.3.3-amd64.deb
Found .pyc files in securedrop-app-code-0.3.10-amd64.deb

conorsch avatar Dec 10 '16 01:12 conorsch

On the Debian FAQ for ReproducibleBuilds: Will Ubuntu use "reproducible builds" as debian is planning to do? Response circa 2013:

With very few exceptions, nearly all of Debian's work on this will just be going into the packages that form part of the package build toolchain, and as such Ubuntu will inherit it over the natural course of merging and syncing packages from Debian. The possible exceptions are things like the proposed libfaketime etc. preloads that we might insert into builds; I'd certainly be keen to keep up to date with things Debian does in this area, not just to protect against intrusion but also because there are immediate practical benefits to doing so (safer multiarch handling).

ageis avatar Mar 04 '18 14:03 ageis

This came up in a chat on gitter, but it might be worth using docker containers for reproducible builds. One file that looks simple (basically just shell) and gives us the advantage of caching of build steps which is hugely annoying for testing anything related to deployment. The current turnaround for testing a change to postinst to know if it did the right thing is... 5 minutes? Maybe longer. Unless things have changed, Signal does this. Docker is already a part of our dev cycle, and we're trying to phase out Vagrant.

heartsucker avatar Apr 04 '18 16:04 heartsucker

Thanks to @redshiftzero's work on https://reproduciblewheels.com/, all wheels for by the securedrop-app-code package can now be reproducibly built using the following changes to the build configuration:

  1. Set SOURCE_DATE_EPOCH https://github.com/redshiftzero/reproduciblewheels/blob/main/check.py#L38
  2. Compile wheel with the proper parameters: https://github.com/redshiftzero/reproduciblewheels/blob/main/check.py#L128

This now unblocks the ability to provide reproducible builds for securedrop-app-code debian package from source files.

emkll avatar Aug 20 '20 18:08 emkll

thanks to @conorsch for pointing out that passing a constant --build dir removes the final source of non-determinism for our wheel builds!

redshiftzero avatar Aug 20 '20 18:08 redshiftzero