PRONOM_Research icon indicating copy to clipboard operation
PRONOM_Research copied to clipboard

Android Package

Open BertrandCaron opened this issue 1 year ago • 9 comments

Format name Android Package

Version number Unrelevant as far as I know.

Extensions apk

MIME/Media Type Unofficial MIME type: application/vnd.android.package-archive

Description Android Package (APK) is a container file format for distribution and installation of applications for the Android operating system. It is an extension of both the Java Archive format and the ZIP container format.

Format type Aggregate

File format identification signatures The format can be identified because it contains a file named AndroidManifest.xml at the root level. Note that it should have priority over both the ZIP format (x-fmt/263) and the Java Archive format (x-fmt/412).

Examples can be downloaded at https://www.apkmirror.com/.

BertrandCaron avatar May 18 '24 14:05 BertrandCaron

FYI this is also fdd592: https://www.loc.gov/preservation/digital/formats/fdd/fdd000592.shtml

kmurmur avatar May 20 '24 12:05 kmurmur

@BertrandCaron There is also an Android App Bundle format (AAB) which includes the "AndroidManifest.xml" file, but not at the root, and the Android Library Projects (AAR) format also has the Manifest XML file. Do you think we should include identification of these two as well for more accurate identification? The "AndroidManifest.xml" can be XML but also Binary, which may complicate accurate identification.

thorsted avatar Dec 27 '24 18:12 thorsted

Hi @thorsted ! Regarding AAB, that could be useful and pretty easy because the AndroidManifest.xml file is in /base/manifest (https://developer.android.com/guide/app-bundle/app-bundle-format). From what I've seen, AndroidManifest.xml files are always binary files, and more specifically encoded in an Android XML flavor. We use https://gist.github.com/i64/b7d9d5e9c7745c276d34ac21289f6537 to decode them. Do you think we should look for a signature inside the AndroidManifest.xml file to distinguish APK from AAR ?

BertrandCaron avatar Dec 28 '24 16:12 BertrandCaron

@BertrandCaron Yes, AAB should be straight forward. The AAR format appears to have the "AndroidManifest.xml" file at the root of the ZIP, therefore might clash with APK is we only use the name of the file for the container signature. The AAR, from the few samples I have, appear to have standard text XML, while the APK uses the AXML (Binary XML). Might be good to add some bytes to the container signatures to distinguish between the two. Or a separate file unique to each.

thorsted avatar Dec 30 '24 23:12 thorsted

Thanks @thorsted , I'll try to investigate a little further after next Monday!

BertrandCaron avatar Jan 03 '25 13:01 BertrandCaron

Hi @thorsted !

A few additional findings, from https://www.youtube.com/watch?v=ccdFpMC-qaE&t=190s :

  • AAR files have an AndroidManifest.xml but it's plain XML, not Android Binary XML. So maybe we could search for the string "manifest" in the first few bytes?
  • APK files have classes.dex (Java classes compiled into Dalvik exectuable, if I understand correctly) while AAR files contain Java classes in a classes.jar file.

That would help us distinguishing the two formats, with maybe AAR having precedency over APK? What do you think?

BertrandCaron avatar Jan 21 '25 16:01 BertrandCaron

From https://developer.android.com/studio/projects/android-library.html#aar-contents :

The only mandatory entry is /AndroidManifest.xml.

An AAR file can also include one or more of the following optional entries:

/classes.jar

BertrandCaron avatar Jan 30 '25 13:01 BertrandCaron

Examples of AAR files can be downloaded at https://search.maven.org/search?q=.aar.

BertrandCaron avatar Jan 30 '25 18:01 BertrandCaron

We worked with @thorsted on Android application signatures and he came up with three signatures for

  • Android Package (APK) - the end-user distribution format for Android applications ;
  • Android Bundles (AAB) - the format in which developers upload their applications in the Google Play Store ;
  • Android Archive (AAR) - the format of APK libraries that can be integrated in other APK applications.

Files were tested by me on my samples and by Tyler on his corpus to check for false positives.

Signature files can be found here: https://github.com/BertrandCaron/signatures/tree/main/android_applications

BertrandCaron avatar Feb 07 '25 16:02 BertrandCaron