cyclonedx-cli icon indicating copy to clipboard operation
cyclonedx-cli copied to clipboard

`merge` output includes UTF-8 byte-order-marks

Open nil4 opened this issue 4 years ago • 3 comments
trafficstars

After upgrading cyclonedx-cli from v0.16.0 to v0.19.0, it was observed that XML BOMs produced by the merge command started being rejected when uploaded to Dependency Track v4.3.6.

The issue is that cyclonedx-cli merge now produces XML BOM files with UTF-8 byte-order-marks, unlike the earlier version, and Dependency Track rejects such files as invalid. Arguably, that's a DT bug, reported separately (https://github.com/DependencyTrack/dependency-track/issues/1214)

Steps to reproduce

Merge two XML BOM files using cyclonedx-cli v0.19.0 and 0.16.0, respectively, and compare the outputs. The two files are identical, except for the UTF-8 byte-order-mark present in the later version.

The attached ZIP file includes sample input (bom1.xml and bom2.xml) and output files: merge-sample-with-byte-order-mark.zip

> cyclonedx-win-x64 --version
0.16.0
> cyclonedx-win-x64 merge --input-files bom1.xml bom2.xml --output-file output-0.16.xml

The output-0.16.xml file has no UTF-8 byte-order-mark:

merge-output-0 16-no-utf-8-bom

When uploaded to DT, it is processed as expected:

INFO [BomUploadProcessingTask] Processing CycloneDX BOM uploaded to project: dcb2d96d-2387-4f39-9c8d-33d195040d90

Repeat these steps with the latest version, and note that now a UTF-8 byte-order-mark is present:

> cyclonedx-win-x64 --version
0.19.0
> cyclonedx-win-x64 merge --input-files bom1.xml bom2.xml --output-file output-0.19.xml

merge-output-0 19-with-utf-8-bom

When uploading this output-0.19.xml file to DT, it is rejected:

WARN [BomUploadProcessingTask] The BOM uploaded is not in a supported format. Supported formats include CycloneDX XML and JSON

The only difference between the accepted and rejected BOM files is the presence or absence of the UTF-8 byte-order-mark.

nil4 avatar Oct 06 '21 19:10 nil4

Just ran into this when trying to use cyclonedx-cli via the Docker image. When trying to read the file within Maven, BomParserFactory.createParser throws an exception on line 55 as byte 0 is not one of the expected values (123 or 60). This seems to relate to the byte-order mark set on the file.

In BASH is it possible to remove this with sed -i $'1s/^\uFEFF//' your-bom.xml

Not ideal, but at least it is possible to workaround.

roadSurfer avatar Jan 24 '23 16:01 roadSurfer

I have using the long java way as it it was easier to include this in my process than a bash script

Something like :

import org.apache.commons.io.FileUtils;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.input.BOMInputStream;
import org.cyclonedx.BomParserFactory;

import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.nio.charset.Charset;

public class RemoveByteOrderMarks {

    /**
     * @see "https://github.com/CycloneDX/cyclonedx-cli/issues/178"
     * @see "https://stackoverflow.com/questions/21891578/removing-bom-characters-using-java"
     */
    public static void removeUTF8ByteOrderMarks(File file) throws IOException {
        System.out.println("Removing Byte-Order-Marks from BOM file " + file.getAbsolutePath());
        final Charset utf8 = Charset.forName("UTF-8");
        FileUtils.writeStringToFile(file,
                IOUtils.toString(new BOMInputStream(new FileInputStream(file)), utf8),
                utf8);
    }
}

jgraglia avatar Jan 24 '23 17:01 jgraglia