Open-XML-SDK
Open-XML-SDK copied to clipboard
Is it possible to check if a file is a valid document and not some other file type renamed to .docx or some other office extension?
Is this a:
- [ ] Issue with the OpenXml library
- [X] Question on library usage
Description
Is it possible to check if a file is a valid document and not some other file type renamed to .docx or some other office extension? I know this may be off-topic here but it is becoming an increasing problem for me as people are renaming other files e.g. videos to .docx and uploading them to my servers.
Information
- .NET Target: .NET Core 3.1
- DocumentFormat.OpenXml Version: Any version
Hi,
I don't know if there is an official way to do it but the easiest way that i can think of is quickly open and close the document in a try block using WordProcessingDocument.Open. if it fails, put something in the catch statement to tell the user that it is not a valid docx file.
If, for some reason, that isn't an option, you could try to open it as a Package and look for a uri that exists in all docx files. However, this way would take more steps.
Hope this helps.
Thanks, @rmboggs This is what I am currently doing but I wondered if/should there be an official way to do this as to me it seems like a very common use case. @twsouthwick @tomjebo What are your opinions on this?
That process seems fine. Are you asking for a simple API? There isn't a WordProcessingDocument.IsValidDocument(Package)
API at the moment, so what @rmboggs said would be the best option. If you want to propose an API, we're happy to take suggestions
Something like this should work. I'm doing something similar with one of my projects during unit testing. You shouldn't even need to use the SDK to validate this way.
using System;
using System.IO;
using System.IO.Packaging;
namespace Sample
{
public static class PackagingChecks
{
public static bool IsValidWordprocessingDocument(string path)
{
if (String.IsNullOrEmpty(path) || !File.Exists(path)) return false;
var uri = new Uri("/word/document.xml", UriKind.Relative);
var pkg = Package.Open(path);
return pkg.PartExists(uri);
}
}
}
That process seems fine. Are you asking for a simple API? There isn't a
WordProcessingDocument.IsValidDocument(Package)
API at the moment, so what @rmboggs said would be the best option. If you want to propose an API, we're happy to take suggestions
Thanks this was what I was proposing and I think we should have the same thing for Spreadsheet and Presentation documents too.
Is this issue still open? Was anything added?
I suggest we make sure that something like IsValidWordprocessingDocument() makes sure the document is at least a minimum viable WordprocessingDocument, for example. On top of containing the main document part (word/document.xml), the part must not be empty but at least contain an w:document with a w:body child element (IIRC).
The API could also offer an optional full validity check.