iWorkFileFormat
iWorkFileFormat copied to clipboard
Identify file type
Hello, I wonder if there is an easy way to identify the iworks file type, because the file command can only tell it's a zip archive. I don't need to extract file content. Any suggestion?
Besides the file extensions?
In Objective-C/Swift you could also use NSWorkspace's type(ofFile:) method. However, for me this method proved to be quite unreliable and like ccharlton I prefer using the extension
Checking the file extension is a little weak. I'd like to block sensitive information leaking outside, so I need to send different types of file to proper recognition engines. Someone can just remove the file extension or change it to something else to circumvent examination. The NSWorkspace's type(ofFile:) method seems to work as the file command, which only reports it's a zip archive.
This sounds tricky. I guess you will need to dig into the file, e.g. to check if the contents of the zip archive conform to the file format. Maybe a good starting point:
https://developer.apple.com/library/archive/documentation/FileManagement/Conceptual/FileSystemProgrammingGuide/FileSystemOverview/FileSystemOverview.html
Maybe also this thread on using the mdls
command over here is interesting:
https://superuser.com/questions/323599/is-it-possible-to-query-the-launch-services-database-for-applications-that-will
The only way to determine the file format is to speculatively parse. Blanking all of the plists and stripping the non-.iwa
files preserves the correctness in the respective applications and thus cannot be used in the process.
In /Index/Document.iwa
the root DocumentArchive
message is always of type 1 (and always message index 1). This message is sufficient.
In the 11.2 apps, the required fields are:
// Keynote optional fields 4
message DocumentArchive {
required .TSA.DocumentArchive super = 3;
required .TSP.Reference show = 2;
}
// Numbers optional fields 1, 3, 7, 9, 10, 11, 12
message DocumentArchive {
required .TSA.DocumentArchive super = 8;
required .TSP.Reference stylesheet = 4;
required .TSP.Reference sidebar_order = 5;
required .TSP.Reference theme = 6;
}
// Pages optional fields 2 - 7, 11 - 14, 16, 17, 20, 21, 30 - 49
message DocumentArchive {
required .TSA.DocumentArchive super = 15;
}
So the following suffices:
- find `Index/Document.iwa` in the container, de-frame and find message 1 of type 1
- do a shallow parse of the protobuf message
-- if field 15 is present: file is of type "PAGES"
-- else if field 2 is present: file is of type "KEYNOTE"
-- else: file is of type "NUMBERS"
@obriensp feel free to add this to the README if you are still updating / interested. Most of the iWork file format ecosystem projects still refer to the notes.