trufflehog icon indicating copy to clipboard operation
trufflehog copied to clipboard

Add support for scanning APK files

Open joeleonjr opened this issue 1 year ago • 9 comments

Description:

APK (Android Package Kit) files are used by Android to install and distribute applications. These files are essentially zip archives with a specific directory structure for Android apps. We currently scan them as normal zip archives; however, most of the locations within an APK that secrets would live aren't properly decompiled/decoded during our regular zip file scanning. This PR adds special support to decompile APKs and then search them for secrets.

The most robust approach for searching APKs for secrets is to use a decompiler like jadx and then run TruffleHog against the output. The downsides are twofold: (1) this would require TruffleHog users to install jadx, (2) decompiling takes a while (up to several minutes) and a lot of memory. Instead of going this route, we rely on two golang libraries (dextk and apkparser) that balance functionality and performance to get us 80% of the way there without any external dependencies.

Through this PR, TruffleHog users can now scan for secrets in APK files. Here's what is specifically scanned:

XML

Android'sxml files need to be decoded in order to properly scan them because the places were secrets might live are often stored as reference IDs instead of plain text strings. This PR runs an Android XML decoder that uses the resources.arsc file as context to automatically resolve most resource reference IDs into their corresponding value.

AndroidManifest.xml

This approach includes the important AndroidManifest.xml file.

Strings.xml

One of the xml files that is most likely to contain a secret is called strings.xml. This file proved a challenge b/c during the APK compilation process, the file is transformed in a way that when we run unzip file.apk, we can't just see the strings.xml file. A tool like jadx would easily decompile it, but since we're not using it, we had to find a different way to get at that data.

We found that the resources.arsc file houses the key/value pairs that might contain secrets from the strings.xml file in the resources ID range: 0x7f000000-0x7fffffff. So we iterate through all resources of type string in that range, and search for secrets there. This seems to work for most scenarios, but admittedly we need greater testing.

Dex

A DEX file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode). APK files generally include at least one DEX file, usually named classes.dex, but if the app is large or modular, there might be multiple DEX files—like classes2.dex, classes3.dex.

We run a golang-based Dex decompiler that helps us identify multiple relevant instruction types from within the bytecode. The one most likely to contain a secret is const-string. The rest are for providing context to our potential secret values.

The challenge is the keyword we need to clue in our scanning engine is often located too far from the secret given that our decompilation method is lightweight and imperfect. As a result, we implement keyword scanning against the decompiled code. If a keyword that we support is found, we then append that keyword to every value (read const-string instruction value) and toss it in for scanning. This ensures we don't lack appropriate coverage and is similar in implementation to our work on Postman.

Note: Since we can't get all of the scanner keywords via the engine pkg (import issues) like we did for Postman, we create a separate file named apk_keywords.go. In an ideal world, defaults.go is ripped out of engine and moved into pkg/detectors, so that we don't need to have the same data listed in two places.

Everything else

All other files are just read-in like normal and passed to a chunk for scanning. We likely won't see many secrets from these files, but it's worth a review. Examples of these types of files are: .json, .properties, etc.

Checklist:

  • [x] Tests passing (make test-community)?
  • [ ] Lint passing (make lint this requires golangci-lint)?

joeleonjr avatar Oct 28 '24 14:10 joeleonjr

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 28 '24 14:10 CLAassistant

Some other common XML files that can probably be safely excluded:

"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file jsr305_annotations/Jsr305_annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}

https://github.com/smlbiobot/cr/blob/master/apk/2.0.1/com.supercell.clashroyale-2.0.1.decoded/unknown/third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml https://github.com/google/guava/blob/master/guava-gwt/src/com/google/common/annotations/Annotations.gwt.xml https://github.com/Goujer/kanojo_app/blob/72c8374c139e16d528bdbdd16a274851da68d753/app/src/main/jsr305_annotations/Jsr305_annotations.gwt.xml#L3

rgmz avatar Oct 28 '24 20:10 rgmz

Actual Filename should also be added to the output.

image

bugbaba avatar Oct 31 '24 02:10 bugbaba

After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the .apk file using the jadx and then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.


The api key was in the .dex file which was correctly handled :heavy_check_mark:

image The reConstString regex is correct and was detecting the line :heavy_check_mark:

image The parseDexInstructions was correctly handling it :heavy_check_mark:

But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.

image The decompiled code

bugbaba avatar Oct 31 '24 03:10 bugbaba

After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the .apk file using the jadx and then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.

The api key was in the .dex file which was correctly handled ✔️

image The reConstString regex is correct and was detecting the line ✔️

image The parseDexInstructions was correctly handling it ✔️

But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.

image The decompiled code

@bugbaba is there anyway you could share that apk file?

joeleonjr avatar Oct 31 '24 13:10 joeleonjr

@joeleonjr Couldn't find you in the discord server, Please ping ben10_01 on discord or nomanAli181 on twitter

bugbaba avatar Oct 31 '24 14:10 bugbaba

@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all const-string values, so that we don't miss as much. It's still not perfect, but should be better and still performant.

joeleonjr avatar Nov 01 '24 18:11 joeleonjr

Note to reviewers: Please look at how we handle an error caused by calling h.processAPK(). Specificallly, I thought it would make sense to call newArchiveHandler().HandleFile() if the APK parsing failed. That would ensure any file that is handled as an .apk would still be processed as a .zip in case it was mislabeled by the user (shouldn't happen, but who knows).

joeleonjr avatar Nov 01 '24 19:11 joeleonjr

It is now able to detect the key for that specific apk file. But is there any way to avoid having pkg/handlers/apk_keywords.go file? As its become another file to maintain with each detector change.

@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all const-string values, so that we don't miss as much. It's still not perfect, but should be better and still performant.

bugbaba avatar Nov 02 '24 05:11 bugbaba

We should add a feature flag for this called EnableAPKHandler which would be off by default (empty bool value) in the feature package: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/feature/feature.go

We can turn it on by default in OSS, and when it's imported into Enterprise it will be off by default unless we override it with a feature flag.

Joe, let me know if you want to sync on how this works.

dustin-decker avatar Nov 13 '24 20:11 dustin-decker

Great job, @joeleonjr! 🥳 This looks like it was quite the project to get working. I’m excited to see it in action—and to handle the inevitable user question about verifying findings we found in their .apk file. 🤣

ahrav avatar Nov 15 '24 03:11 ahrav

usage of trufflehog to scanning apk?

iamunixtz avatar Dec 05 '24 15:12 iamunixtz

usage of trufflehog to scanning apk?

Yes, that's correct. Feel free to check out our blog for additional details.

ahrav avatar Dec 05 '24 16:12 ahrav

trufflehog shoul improve regex on findings api keys and others secrets like i test on apk which contains api keys but it failed to get them

iamunixtz avatar Dec 05 '24 17:12 iamunixtz