Open-XML-SDK icon indicating copy to clipboard operation
Open-XML-SDK copied to clipboard

Is it possible to check if a file is a valid document and not some other file type renamed to .docx or some other office extension?

Open 06needhamt opened this issue 4 years ago • 6 comments

Is this a:

  • [ ] Issue with the OpenXml library
  • [X] Question on library usage

Description

Is it possible to check if a file is a valid document and not some other file type renamed to .docx or some other office extension? I know this may be off-topic here but it is becoming an increasing problem for me as people are renaming other files e.g. videos to .docx and uploading them to my servers.

Information

  • .NET Target: .NET Core 3.1
  • DocumentFormat.OpenXml Version: Any version

06needhamt avatar Nov 02 '20 20:11 06needhamt

Hi,

I don't know if there is an official way to do it but the easiest way that i can think of is quickly open and close the document in a try block using WordProcessingDocument.Open. if it fails, put something in the catch statement to tell the user that it is not a valid docx file.

If, for some reason, that isn't an option, you could try to open it as a Package and look for a uri that exists in all docx files. However, this way would take more steps.

Hope this helps.

rmboggs avatar Nov 02 '20 21:11 rmboggs

Thanks, @rmboggs This is what I am currently doing but I wondered if/should there be an official way to do this as to me it seems like a very common use case. @twsouthwick @tomjebo What are your opinions on this?

06needhamt avatar Nov 08 '20 12:11 06needhamt

That process seems fine. Are you asking for a simple API? There isn't a WordProcessingDocument.IsValidDocument(Package) API at the moment, so what @rmboggs said would be the best option. If you want to propose an API, we're happy to take suggestions

twsouthwick avatar Nov 10 '20 19:11 twsouthwick

Something like this should work. I'm doing something similar with one of my projects during unit testing. You shouldn't even need to use the SDK to validate this way.

using System;
using System.IO;
using System.IO.Packaging;

namespace Sample
{
    public static class PackagingChecks
    {
        public static bool IsValidWordprocessingDocument(string path)
        {
            if (String.IsNullOrEmpty(path) || !File.Exists(path)) return false;
            var uri = new Uri("/word/document.xml", UriKind.Relative);
            var pkg = Package.Open(path);
            return pkg.PartExists(uri);
        }
    }
}

rmboggs avatar Nov 10 '20 19:11 rmboggs

That process seems fine. Are you asking for a simple API? There isn't a WordProcessingDocument.IsValidDocument(Package) API at the moment, so what @rmboggs said would be the best option. If you want to propose an API, we're happy to take suggestions

Thanks this was what I was proposing and I think we should have the same thing for Spreadsheet and Presentation documents too.

06needhamt avatar Nov 10 '20 23:11 06needhamt

Is this issue still open? Was anything added?

I suggest we make sure that something like IsValidWordprocessingDocument() makes sure the document is at least a minimum viable WordprocessingDocument, for example. On top of containing the main document part (word/document.xml), the part must not be empty but at least contain an w:document with a w:body child element (IIRC).

The API could also offer an optional full validity check.

ThomasBarnekow avatar Jun 16 '21 10:06 ThomasBarnekow