checksum-maven-plugin icon indicating copy to clipboard operation
checksum-maven-plugin copied to clipboard

Contents should include checksum and filename in standard format

Open ctubbsii opened this issue 3 years ago • 9 comments

This plugin should support writing files in a standard format, for easier verification. Standard tools have a convenient -c option to verify a checksum file, but this doesn't work with the checksums created by this plugin, because they are not in a standard format.

There are two standard file formats for use with checksum files:

  1. The GNU coreutils format used by sha512sum on GNU/Linux distributions (see man sha512sum). This outputs in the format <checksum><space><spaceInTextModeOrAsteriskInBinaryMode><filename><newline>, repeated for each file whose checksum is contained in the file (in this case, there would only be one file's checksum).
  2. The BSD format used by equivalent UNIX tools in BSD/Unix distributions, which is also supported by GNU coreutils with the --tag option (see man sha512sum). This outputs in the format <ALGNAME><space><lparen><filename><rparen><space><equal><space><checksum><newline> for each checksum.

Both of these standard formats are also supported by the shasum executable backed by the commonly used Digest::SHA perl module.

Here's some examples (using tee to output the content of the checksum file as it is written):

$ shasum -a 512 pom.xml | tee pom.xml.sha512 # using Digest::SHA to create GNU format
b004deb83fa29a7d5b6f43141f8df9f84571ba6e8800ac9a72c5c40ccbdfca2c7866568de697e163fe8c2be7ae1d2ec8b3e907a1b902e5a8ca4bbb7360a9131d  pom.xml
$ shasum -c pom.xml.sha512  # using Digest::SHA to verify GNU format
pom.xml: OK
$ shasum -a 512 --tag pom.xml | tee pom.xml.sha512  # using Digest::SHA to create BSD format
SHA512 (pom.xml) = b004deb83fa29a7d5b6f43141f8df9f84571ba6e8800ac9a72c5c40ccbdfca2c7866568de697e163fe8c2be7ae1d2ec8b3e907a1b902e5a8ca4bbb7360a9131d
$ shasum -c pom.xml.sha512  # using Digest::SHA to verify BSD format
pom.xml: OK
$ sha512sum pom.xml | tee pom.xml.sha512 # using GNU coreutils to create GNU format
b004deb83fa29a7d5b6f43141f8df9f84571ba6e8800ac9a72c5c40ccbdfca2c7866568de697e163fe8c2be7ae1d2ec8b3e907a1b902e5a8ca4bbb7360a9131d  pom.xml
$ sha512sum -c pom.xml.sha512  # using GNU coreutils to verify GNU format
pom.xml: OK
$ sha512sum --tag pom.xml | tee pom.xml.sha512  # using GNU coreutils to create BSD format
SHA512 (pom.xml) = b004deb83fa29a7d5b6f43141f8df9f84571ba6e8800ac9a72c5c40ccbdfca2c7866568de697e163fe8c2be7ae1d2ec8b3e907a1b902e5a8ca4bbb7360a9131d
$ sha512sum -c pom.xml.sha512  # using GNU coreutils to verify BSD format
pom.xml: OK

I didn't show an example with the -b binary flag for the GNU format examples, but I strongly recommend using BSD format anyway, which always uses binary mode when generating and verifying checksums.

For me, using this plugin is a downgrade because the file formats it emits are not easily verified with standard tools. If it output in a standard format (preferably the BSD format, because it shows the algorithm used explicitly, which will be important as SHA3 becomes more common, and always uses binary mode), this plugin would be far more useful.

ctubbsii avatar Oct 02 '21 17:10 ctubbsii

I am also looking to use shasum to verify releases with a command like this:

find . -type f -name "*.sha512" -exec shasum -c {} -a 512 \;

Looking at the ArtifactsMojo class, line 128 and OneHashPerFileTarget line 145, the GNU format seems to be already supported.

To switch this on, add <appendFilename>true</appendFilename> to the configuration.

Example usage:

<plugin>
<groupId>net.nicoulaj.maven.plugins</groupId>
<artifactId>checksum-maven-plugin</artifactId>
<version>1.11</version>
<executions>
<execution>
    <id>calculate-checksums</id>
    <goals>
        <goal>files</goal>
    </goals>
    <!-- execute prior to maven-gpg-plugin:sign due to https://github.com/nicoulaj/checksum-maven-plugin/issues/112 -->
    <phase>post-integration-test</phase>
    <configuration>
        <appendFilename>true</appendFilename> <!-- ADD THIS LINE TO THE CONFIGURATION -->
        <algorithms>
            <algorithm>SHA-256</algorithm>
            <algorithm>SHA-512</algorithm>
        </algorithms>
        <!-- https://maven.apache.org/apache-resource-bundles/#source-release-assembly-descriptor -->
        <fileSets>
            <fileSet>
                <directory>${project.build.directory}</directory>
                <includes>
                    <include>${myproject}-${project.version}-src.zip</include>
                    <include>${myproject}-${project.version}-src.tar.gz</include>
                    <include>${myproject}-${project.version}-bin.zip</include>
                    <include>${myproject}-${project.version}-bin.tar.gz</include>
                </includes>
            </fileSet>
        </fileSets>
        <csvSummary>false</csvSummary>
    </configuration>
</execution>

remkop avatar Dec 21 '21 07:12 remkop

As far as I remember, OpenSSL produces BSD-style as well.

michael-o avatar Jan 06 '22 22:01 michael-o

I think best would be to drop this appendFilename altogether and introduce an outputFormat with an interpolator along with two symbolic names:

  • ${digest}
  • ${algorithm}
  • ${filename}
  • GNU
  • BSD

michael-o avatar Jan 06 '22 22:01 michael-o

Also, keep in mind GNU has two formats, one for text mode input (${digest}<space><space>${filename}) and one for binary mode input (${digest}<space><star '*' literal>${filename}) (although, in practice, they are equivalent on GNU systems).

ctubbsii avatar Jan 06 '22 22:01 ctubbsii

I am either completely stupid, but I really don't understand the purpose the text mode at all. All of those message digest operate on bytes. What do I miss?

michael-o avatar Jan 06 '22 22:01 michael-o

The intention of text mode was that the checksum would normalize line endings for text files. \n or \r or \n\r all would hash the same. Files that differed in line ending would have the same hash value. Today it is a rarely useful feature.

bondolo avatar Jan 06 '22 22:01 bondolo

Here are the relevant algo to name mappings for BSD format: https://github.com/freebsd/freebsd-src/blob/78beb051a2661b873342162b1ec0ad55b4e27261/sbin/md5/md5.c#L122-L156

michael-o avatar Jan 06 '22 22:01 michael-o

I think this is something we really want to have for all Maven-based ASF releases.

michael-o avatar Jan 11 '22 15:01 michael-o

As ugly as this is I was able to work around this problem with

      <plugin>
        <artifactId>maven-antrun-plugin</artifactId>
        <version>3.0.0</version>
        <executions>
          <execution>
            <phase>post-integration-test</phase>
            <configuration>
              <target>
                <property name="spaces" value="  "/>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-src.zip.sha256" append="yes">${spaces}apache-log4j-${project.version}-src.zip</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-src.zip.sha512" append="yes">${spaces}apache-log4j-${project.version}-src.zip</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-src.tar.gz.sha256" append="yes">${spaces}apache-log4j-${project.version}-src.tar.gz</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-src.tar.gz.sha512" append="yes">${spaces}apache-log4j-${project.version}-src.tar.gz</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-bin.zip.sha256" append="yes">${spaces}apache-log4j-${project.version}-bin.zip</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-bin.zip.sha512" append="yes">${spaces}apache-log4j-${project.version}-bin.zip</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-bin.tar.gz.sha256" append="yes">${spaces}apache-log4j-${project.version}-bin.tar.gz</concat>
                <concat destfile="${project.build.directory}/apache-log4j-${project.version}-bin.tar.gz.sha512" append="yes">${spaces}apache-log4j-${project.version}-bin.tar.gz</concat>
              </target>
            </configuration>
            <goals>
              <goal>run</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

rgoers avatar Feb 21 '22 16:02 rgoers