Clippit icon indicating copy to clipboard operation
Clippit copied to clipboard

PresentationBuilder.PublishSlides generates slides with different data

Open f1nzer opened this issue 3 years ago • 3 comments

I'm using PresentationBuilder.PublishSlides to generate slides from the original pptx file. The problem is that this method returns non-deterministic results from run to run: slide's DocumentByteArray has different data - there is a difference in several bytes.

Is it an expected behavior or not? Thanks.

Simple repro code (NET SDK 6.0.100, Clippit 1.8.1):

using System.IO;
using System.Linq;
using Clippit.PowerPoint;
using DocumentFormat.OpenXml.Packaging;
using Xunit;

namespace PptxTest;

public class UnitTest1
{
    [Fact]
    public void PublishSlides_Should_GenerateSameDataInTwoRuns()
    {
        const string filePath = @"use any pptx file path";
        
        var sizesForSlides1 = SplitPptxAndGetByteSizesForSlides(filePath);
        var sizesForSlides2 = SplitPptxAndGetByteSizesForSlides(filePath);

        Assert.Equal(sizesForSlides1, sizesForSlides2);
    }

    private static int[] SplitPptxAndGetByteSizesForSlides(string filePath)
    {
        using var fileContentStream = File.OpenRead(filePath);
        using var document = PresentationDocument.Open(fileContentStream, false);
        var slides = PresentationBuilder.PublishSlides(document, null);

        return slides.Select(slide => slide.DocumentByteArray.Length).ToArray();
    }
}

f1nzer avatar Nov 12 '21 10:11 f1nzer

I am not quite sure, but I think that ZIP archives (*.pptx, *.docx, *.xlsx) are not deterministic by their native

According to Wikipedia http://en.wikipedia.org/wiki/Zip_(file_format) seems that zip files have headers for File last modification time and File last modification date so any zip file checked into git will appear to git to have changed if the zip is rebuilt from the same content since. And it seems that there is no flag to tell it to not set those headers.

From SO

sergey-tihon avatar Nov 12 '21 11:11 sergey-tihon

That's interesting.

In addition to that, in my case _rels/.rels file has several <Relationship .. tags where Id are unique (another file has a different id set). The same story for other rels files (see ppt folder).

f1nzer avatar Nov 12 '21 12:11 f1nzer

Ha! You are right, OpenXmlPowerTools historically uses GUIDs as relationship IDs https://github.com/sergey-tihon/Clippit/blob/e0da582d4f0149788429224f5bffeae4cffe96ff/OpenXmlPowerTools/PowerPoint/PresentationBuilderTools.cs#L533-L534

sergey-tihon avatar Nov 12 '21 12:11 sergey-tihon