Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

The second attempt to open an unchanged presentation file for edit throws a "file contains corrupted data" error.

Open Kalle4242 opened this issue 3 years ago • 6 comments

Description

We are trying to access PowerPoint presentation via MS-Graph from a SharepPoint Online document library.

Our PowerPoint Addin adds data as CustomXmlParts to presentation and the slides as well. It can be opened, manipulated and saved without problems as long as working files arer on a local computer.

The problem occurs when the file is uploaded to a SharePoint document library. I open a small example presentation file as a stream, copy it to a memory stream and provide it to the presentation document to open it for edit. I do not change anything and after saving the presentation I copy the stream back to the SharePoint replacing the previously opened file. This works well! But, trying this for the second time, a "File contains corrupted data." occurs, when trying to open the presentation document for edit from the memory stream.

The error's stack trace suggest that "MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate" has a problem with the data from the second stream.

at MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate(String fileName, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader) at MS.Internal.IO.Zip.ZipIOLocalFileBlock.ParseRecord(BinaryReader reader, String fileName, Int64 position, ZipIOCentralDirectoryBlock centralDir, ZipIOCentralDirectoryFileHeader centralDirFileHeader) at MS.Internal.IO.Zip.ZipIOLocalFileBlock.SeekableLoad(ZipIOBlockManager blockManager, String fileName) at MS.Internal.IO.Zip.ZipIOBlockManager.LoadLocalFileBlock(String zipFileName) at MS.Internal.IO.Zip.ZipArchive.GetFile(String zipFileName) at MS.Internal.IO.Zip.ZipArchive.GetFiles() at System.IO.Packaging.ZipPackage.ContentTypeHelper..ctor(ZipArchive zipArchive, IgnoredItemHelper ignoredItemHelper) at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode mode, FileAccess access, Boolean streaming) at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess, Boolean streaming) at DocumentFormat.OpenXml.Packaging.PackageLoader.OpenCore(Stream stream, Boolean readWriteMode) at DocumentFormat.OpenXml.Packaging.PresentationDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings) at DocumentFormat.OpenXml.Packaging.PresentationDocument.Open(Stream stream, Boolean isEditable) ...

Even the "corrupted" file can be opened from SharePoint in a PowerPoint browser app. After a small change, it is saved automatically and after this I can reopen and save it as often as I like to. It seems as if this has a healing effect on the file.

I know that different opening and saving mechanisms use different methods optimizing OpenXml files. I know as well that these mechanisms use different zippers, but I cannot steer, to use right one for my file.

Comparing both files (corrupted and healed) shows that the CustomXmlParts are reorganized and renumbered in the healed one. By I cannot find a crucial difference.

The question is, what is going wrong or is processed different in "MS.Internal.IO.Zip.ZipIOLocalFileBlock.Validate"? What can it not accept that is accepted elsewhere?

CorruptedAtSecondOpen.pptx

Information

  • .NET Target: Net4.8
  • DocumentFormat.OpenXml Version: (2.15.0)

Repro

        private bool OpenCloseAndSave(IReadOnlyStorageAccessInfo presAccessInfo)
        {
            try
            {
                using (MemoryStream memStream = OpenPresentationAsWritableStream(presAccessInfo))
                {
                    using (var presentationDocument = PresentationDocument.Open(memStream, true))
                    {
                        // Save the presentation.
                        presentationDocument.PresentationPart.Presentation.Save();
                        presentationDocument.Save();
                        presentationDocument.Close();

                        if (presAccessInfo is IEditableStorageAccessInfo editablePresentationStorageInfo)
                        {
                            editablePresentationStorageInfo.SaveStream(memStream);
                        }
                    }
                }
                return true;
            }
            catch (Exception ex)
            {
                Logging.Log.Error(nameof(TestSharePointDocUpload), nameof(OpenCloseAndSave), ex);
            }
            return false;
        }

Observed

Opening a by user unchanged file for the second time causes a "file contains corrupted data" error.

Expected

Opening a by user unchanged file for the second time should be opened without error as it was openend for the first time.

Kalle4242 avatar Mar 15 '22 08:03 Kalle4242

I copied the code from the called methods into my example method, and I hope that it now is easier to comprehend.

        private bool OpenCloseAndSave(string filename)
        {
            try
            {
                var pathToTestFile = Path.Combine(_pathToTestLocation, _nameOfSpUploadTestLocation, filename);

                using (MemoryStream contentStream = new MemoryStream())
                {
                    IDriveItemRequestBuilder driveItemRequestBuilderDownload = GetGraphServiceClient().Sites.Root.Lists[_nameOfTestDocumentLibrary].Drive.Root.ItemWithPath(pathToTestFile);

                    using (Stream stream = driveItemRequestBuilderDownload.Content.Request().GetAsync().GetAwaiter().GetResult())
                    {
                        stream.CopyTo(contentStream);
                    }

                    using (var presentationDocument = PresentationDocument.Open(contentStream, true))
                    {
                        // Save the modified presentation.
                        presentationDocument.PresentationPart.Presentation.Save();
                        presentationDocument.Save();
                        presentationDocument.Close();

                        if (contentStream is null)
                        {
                            throw new ArgumentNullException(nameof(contentStream));
                        }

                        contentStream.Position = 0;
                        IDriveItemRequestBuilder driveItemRequestBuilderUpload = GetGraphServiceClient().Sites.Root.Lists[_nameOfTestDocumentLibrary].Drive.Root.ItemWithPath(pathToTestFile);
                        DriveItem driveItem = driveItemRequestBuilderUpload.Content.Request().PutAsync<DriveItem>(contentStream).GetAwaiter().GetResult();
                    }
                }
                return true;
            }
            catch (AggregateException ae)
            {
                Debug.Print(ae.Flatten().Message);
            }
            catch (Exception ex)
            {
                Debug.Print(ex.Message);
            }
            return false;
        }

By the way: I'm not doing download, open, save, close and upload just for fun. Usually I'm manipulating the presentation or slides and their accompanied CustomXmlParts between open and save. I just reduced to the simplest example, that shows the issue.

Kalle4242 avatar Mar 15 '22 14:03 Kalle4242

@Kalle4242, I looked at your issue and found that your presentation (attached in your original post) is deemed corrupt by the Open XML SDK when trying to open it the first time:

DocumentFormat.OpenXml.Packaging.OpenXmlPackageException
The document cannot be opened because there is an invalid part with an unexpected content type. 
[Part Uri=/customXml/itemProps10.xml], 
[Content Type=application/xml], 
[Expected Content Type=application/vnd.openxmlformats-officedocument.customXmlProperties+xml].

I don't know how you created the CustomXmlPart but there must be something wrong with how you are using the SDK.

After fixing your presentation (by making a small change and saving), I then created my own integration test and could not reproduce the error, meaning everything worked just fine. Here's my code, which makes use of a ClientSecretCredential and addresses a drive directly (to make it easier for me):

public class OpenXmlIntegrationTests
{
    private const string TenantId = "your-tenant-id";
    private const string ClientId = "your-client-id";
    private const string ClientSecret = "your-client-secret";

    private const string DriveId = "your-drive-id";
    private const string ItemPath = "CorruptedAtSecondOpen.pptx";

    public OpenXmlIntegrationTests(ITestOutputHelper output)
    {
        Output = output;
        Client = new GraphServiceClient(new ClientSecretCredential(TenantId, ClientId, ClientSecret));
    }

    private ITestOutputHelper Output { get; }

    private GraphServiceClient Client { get; }

    [Fact]
    public async Task TestOpenXmlIntegrationAsync()
    {
        for (var i = 0; i < 3; i++)
        {
            // Download the PowerPoint presentation from Microsoft Graph.
            await using MemoryStream content = await GetContentAsync(DriveId, ItemPath);

            // Change the PowerPoint presentation, adding a new CustomXmlPart.
            ChangeContent(content);

            // Upload the changed PowerPoint presentation to Microsoft Graph.
            DriveItem driveItem = await PutContentAsync(content, DriveId, ItemPath);

            // Note that each time the file hash is differnt, reflecting the changes we made.
            // If we simply opened and closed the presentation, the hash would not change.
            Output.WriteLine(driveItem.File.Hashes.QuickXorHash);
        }
    }

    private async Task<MemoryStream> GetContentAsync(string driveId, string itemPath)
    {
        await using Stream content =
            await Client.Drives[driveId].Root.ItemWithPath(itemPath).Content.Request().GetAsync();

        // Copy the stream and reset its position.
        // Resetting is not necessary for the Open XML SDK, but I see this as good practice.
        var memoryStream = new MemoryStream();
        await content.CopyToAsync(memoryStream);
        memoryStream.Seek(0, SeekOrigin.Begin);

        return memoryStream;
    }

    private static void ChangeContent(Stream content)
    {
        // Open the PresentationDocument, which will fail if it is corrupt.
        using PresentationDocument presentationDocument = PresentationDocument.Open(content, true);

        // Add a new CustomXmlPart with the right CustomXmlPartType.
        CustomXmlPart part = presentationDocument.PresentationPart!.AddCustomXmlPart(CustomXmlPartType.CustomXml);

        // Set the part's XML content, using some arbitrary test data.
        // Note that you must add the DocumentFormat.OpenXml.Linq NuGet package to do this.
        part.SetXElement(
            new XElement("MyCustomXmlRoot",
                new XElement("Uri", part.Uri.ToString()),
                new XElement("ContentType", part.ContentType),
                new XElement("Time", DateTimeOffset.UtcNow.ToString("O"))));

        // Note that we don't have to explicitly save the PresentationDocument.
        // This is done automatically once we leave the scope of the using statement,
        // i.e., this method.
    }

    private async Task<DriveItem> PutContentAsync(MemoryStream content, string driveId, string itemPath)
    {
        // Make sure the position is properly reset.
        content.Seek(0, SeekOrigin.Begin);

        return await Client.Drives[driveId].Root.ItemWithPath(itemPath).Content.Request()
            .PutAsync<DriveItem>(content);
    }
}

I also noticed that you used blocking rather than async calls. You should avoid that, even when creating PowerPoint add-ins.

ThomasBarnekow avatar May 09 '22 20:05 ThomasBarnekow

@Kalle4242, does that solve your problem? Can we close this issue?

ThomasBarnekow avatar May 14 '22 10:05 ThomasBarnekow

@ThomasBarnekow Thank you for taking on the topic. I temporarily solved it with a workaround that first saves the file locally and then copies it to the SharePoint. However, this costs additional time, which is why I am still interested in a solution. I'm working on something else right now but I'm hoping to find time this week to try your suggestion.

Kalle4242 avatar May 16 '22 06:05 Kalle4242

Maybe a question first. How do I get the informative error message that not only says something is corrupt, but also what is wrong in the file. So far I only get "File contains corrupted data!". And since it's broken, you can only look inside with the zipper.

Kalle4242 avatar May 16 '22 12:05 Kalle4242

The error message I included in my first response above came from an exception thrown by the Open XML SDK. You can also view and edit Open XML packages using the Open XML Package Editor for Modern Visual Studios, for example.

Did you try it with my code?

ThomasBarnekow avatar May 16 '22 13:05 ThomasBarnekow

@ThomasBarnekow I also encountered this error, how can i see the error message you included above. Does openxml support file in ppt extension? sample2.ppt

Wenfengcheng avatar May 17 '22 08:05 Wenfengcheng