otp icon indicating copy to clipboard operation
otp copied to clipboard

jinterface: Build determenistic OtpErlang.jar

Open avtobiff opened this issue 3 years ago • 10 comments

Handroll lib/jinterface/priv/OtpErlang.jar to support a deterministic build.

Method used: Manually craft META-INF/MANIFEST.MF, touch all JAR file contents with the same timestamp across builds, generate a deterministic ZIP file (the JAR file).

Deterministic build not support on win32 target for now.

See #4417

Signed-off-by: Per Andersson [email protected]

avtobiff avatar Jan 05 '22 01:01 avtobiff

With my limited java knowledge, it looks a little bit iffy to create your own jar file like this with a hard coded manifest. Would it not be less hackish to let jar create the .jar file, then unzip it, touch --date of all files within, and then (re)zip it.

sverker avatar Jan 10 '22 22:01 sverker

Before commenting on the method suggested in this PR, it is possible to have the timestamp of the files set to known value instead of something arbitrary. This is standardised in the environment variable SOURCE_DATE_EPOCH [0].

Is this preferable? If so, reproducible builds should probably be documented somewhere.

[0] https://reproducible-builds.org/docs/source-date-epoch/

With my limited java knowledge, it looks a little bit iffy to create your own jar file like this with a hard coded manifest. Would it not be less hackish to let jar create the .jar file, then unzip it, touch --date of all files within, and then (re)zip it.

The manifest generated by jar on my machine is

Manifest-Version: 1.0
Created-By: 18-ea (Debian)

The only additional thing that would be included is Created-By (which is generated by the jar tool, showing what java implementation was used to generate the jar) [1].

A default manifest only includes Manifest-Version and Created-By [2].

However, the manifest specification includes a required main-section, which in turn, only requires version-info and all other attributes are optional. [3]

manifest-file: main-section newline *individual-section
main-section: version-info newline *main-attribute
version-info: Manifest-Version : version-number

My reasoning was that it was not important information to convey, I might be wrong. It seemed wasteful to first generate a jar (i.e. zip file) then unzip it, and then again recreate it; when a jar file can be created from scratch without an extra jar/unzip step.

What do others do?

Created-By is stripped by the reproducible-build-maven-plugin. [4]

The method suggested in this PR was inspired by Gary Rowe's blogpost on How to create a deterministic JAR. [5]

I don't think creating a JAR file with jar, unzipping, fixing timestamps, then zipping it again will add much.

Generating the jar file with zip will not add the JAR file magic (0xCAFE), so the file will effectively be a zip file while a file generated by jar would have the magic and present itself as such

$ file OtpErlang.jar
OtpErlang.jar: Zip archive data, at least v1.0 to extract, compression method=store
$ file OtpErlang.jar.jar
OtpErlang.jar.jar: Java archive data (JAR)

However, this doesn't seem to bother neither jar or javac, which understands it fine

$ jar -tf OtpErlang.jar
META-INF/
META-INF/MANIFEST.MF
com/ericsson/otp/erlang/
com/ericsson/otp/erlang/OtpMD5.class
(...)
$ cat Test.java
import com.ericsson.otp.erlang.*;
public class Test { public static void main(String args[]) { ; } }
$ javac -classpath OtpErlang.jar Test.java
$ echo $?
0
$ file Test.class
Test.class: compiled Java class data, version 62.0

Trying to use an empty (i.e. corrupted jar) file generates the following error

$ touch Empty.jar
$ javac -classpath Empty.jar Test.java
error: error reading Empty.jar; zip file is empty

Another option is to use strip-nondeterminism [6] if available, which will produce the same result basically. This will add another build dependency though. The JAR file magic will be present and the MANIFEST.MF will be kept as generated by jar.

[1] https://docs.oracle.com/en/java/javase/17/docs/specs/jar/jar.html#main-attributes [2] https://docs.oracle.com/javase/tutorial/deployment/jar/defman.html [3] https://docs.oracle.com/en/java/javase/17/docs/specs/jar/jar.html#manifest-specification [4] https://github.com/Zlika/reproducible-build-maven-plugin/blob/master/src/main/java/io/github/zlika/reproducible/ManifestStripper.java#L24-L32 [5] https://gary-rowe.com/2013-08-08-how-to-create-a-deterministic-jar/ [6] https://reproducible-builds.org/tools/

avtobiff avatar Jan 11 '22 00:01 avtobiff

Ok, I yield about handrolled manifest.

However, I discovered a disadvantage with the current use of touch --date on the existing class files. It disables fast incremental builds. Repeated invocations of make in lib/jinterface/ will recompile all java files as they all look newer than the "old" class files.

sverker avatar Jan 11 '22 14:01 sverker

Ok, I yield about handrolled manifest.

I know it was a handfull, but I had to do all this research myself so might as well present it. :)

I would like to keep the jar file magic, not particularly content with the jar file being identified as a zip file. I'll see if there is another way, perhaps create a Java program which uses java.util.jar.

However, I discovered a disadvantage with the current use of touch --date on the existing class files. It disables fast incremental builds. Repeated invocations of make in lib/jinterface/ will recompile all java files as they all look newer than the "old" class files.

I'll investigate if the jar can be assembled so fast incremental builds are not affected. The jar contents can perhaps be copied to a temporary build directory where the timestamps are fixed.

I would also like to raise that I do not have any possibility to test this on mac, win32, or e.g. any BSD. I know date can be different across platforms. Is it ok to disable reproducible builds on win32?

What about the outstanding question about SOURCE_DATE_EPOCH? Should that be used, if set, instead of a hardcoded value? If so, where should this documentation go?

avtobiff avatar Jan 11 '22 16:01 avtobiff

I'm leaning towards using strip-nondeterminism for this. It retains the jar file magic and it doesn't touch the class files.

It could be an optional build dependency; if you want deterministic builds of jinterface, install strip-nondeterminism.

sverker avatar Jan 11 '22 20:01 sverker

I can redo this PR to use strip-nondeterminism.

Maybe it should still be configurable to use it or not though? I have it installed but might not want to uninstall just to skip building a deterministic build.

Is it ok to check if SOURCE_DATE_EPOCH is set, and if it is then use

strip-nondeterminism --timestamp $SOURCE_DATE_EPOCH OtpErlang.jar

avtobiff avatar Jan 12 '22 13:01 avtobiff

Maybe it should still be configurable to use it or not though? I have it installed but might not want to uninstall just to skip building a deterministic build.

Why would you want to avoid deterministic build?

I don't really see the point with SOURCE_DATE_EPOCH here. Are the file timestamps in the jar file really used for anything? But if you see some use for it, ok.

sverker avatar Jan 13 '22 19:01 sverker

When i Googled around I saw this https://bugs.openjdk.java.net/browse/JDK-8276667 which seems to be an update of the jar command to respect SOURCE_DATE_EPOCH in order to support reproducible builds of jar archives.

On Thu, Jan 13, 2022 at 8:53 PM Sverker Eriksson @.***> wrote:

Maybe it should still be configurable to use it or not though? I have it installed but might not want to uninstall just to skip building a deterministic build. Why would you want to avoid deterministic build?

I don't really see the point with SOURCE_DATE_EPOCH here. Are the file timestamps in the jar file really used for anything? But if you see some use for it, ok.

— Reply to this email directly, view it on GitHub https://github.com/erlang/otp/pull/5580#issuecomment-1012457895, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABFWSDDOBKFTDB56MYVZULUV4USFANCNFSM5LIY4GRA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

KennethL avatar Jan 14 '22 12:01 KennethL

When i Googled around I saw this https://bugs.openjdk.java.net/browse/JDK-8276667 which seems to be an update of the jar command to respect SOURCE_DATE_EPOCH in order to support reproducible builds of jar archives.

We could wait for that and this becomes a documentation issue instead.

avtobiff avatar Jan 14 '22 16:01 avtobiff

It seems like jar will get a new option --date to set timestamp on archived files. [0] If I understood correctly it will be released with OpenJDK 17.0.3 in April 2022. [1]

There are more fixes related to [0], e.g. archive file ordering and zip archive generation which will further help deterministic jar generation.

I'll reiterate this PR to use jar --date if SOURCE_DATE_EPOCH environment variable is set, once a released OpenJDK jar supports it.

[0] https://bugs.openjdk.java.net/browse/JDK-8276766 [1] https://wiki.openjdk.java.net/display/JDKUpdates/JDK+17u

avtobiff avatar Jan 16 '22 19:01 avtobiff