trufflehog
trufflehog copied to clipboard
Add support for scanning APK files
Description:
APK (Android Package Kit) files are used by Android to install and distribute applications. These files are essentially zip archives with a specific directory structure for Android apps. We currently scan them as normal zip archives; however, most of the locations within an APK that secrets would live aren't properly decompiled/decoded during our regular zip file scanning. This PR adds special support to decompile APKs and then search them for secrets.
The most robust approach for searching APKs for secrets is to use a decompiler like jadx and then run TruffleHog against the output. The downsides are twofold: (1) this would require TruffleHog users to install jadx, (2) decompiling takes a while (up to several minutes) and a lot of memory. Instead of going this route, we rely on two golang libraries (dextk and apkparser) that balance functionality and performance to get us 80% of the way there without any external dependencies.
Through this PR, TruffleHog users can now scan for secrets in APK files. Here's what is specifically scanned:
XML
Android'sxml files need to be decoded in order to properly scan them because the places were secrets might live are often stored as reference IDs instead of plain text strings. This PR runs an Android XML decoder that uses the resources.arsc file as context to automatically resolve most resource reference IDs into their corresponding value.
AndroidManifest.xml
This approach includes the important AndroidManifest.xml file.
Strings.xml
One of the xml files that is most likely to contain a secret is called strings.xml. This file proved a challenge b/c during the APK compilation process, the file is transformed in a way that when we run unzip file.apk, we can't just see the strings.xml file. A tool like jadx would easily decompile it, but since we're not using it, we had to find a different way to get at that data.
We found that the resources.arsc file houses the key/value pairs that might contain secrets from the strings.xml file in the resources ID range: 0x7f000000-0x7fffffff. So we iterate through all resources of type string in that range, and search for secrets there. This seems to work for most scenarios, but admittedly we need greater testing.
Dex
A DEX file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode). APK files generally include at least one DEX file, usually named classes.dex, but if the app is large or modular, there might be multiple DEX files—like classes2.dex, classes3.dex.
We run a golang-based Dex decompiler that helps us identify multiple relevant instruction types from within the bytecode. The one most likely to contain a secret is const-string. The rest are for providing context to our potential secret values.
The challenge is the keyword we need to clue in our scanning engine is often located too far from the secret given that our decompilation method is lightweight and imperfect. As a result, we implement keyword scanning against the decompiled code. If a keyword that we support is found, we then append that keyword to every value (read const-string instruction value) and toss it in for scanning. This ensures we don't lack appropriate coverage and is similar in implementation to our work on Postman.
Note: Since we can't get all of the scanner keywords via the engine pkg (import issues) like we did for Postman, we create a separate file named apk_keywords.go. In an ideal world, defaults.go is ripped out of engine and moved into pkg/detectors, so that we don't need to have the same data listed in two places.
Everything else
All other files are just read-in like normal and passed to a chunk for scanning. We likely won't see many secrets from these files, but it's worth a review. Examples of these types of files are: .json, .properties, etc.
Checklist:
- [x] Tests passing (
make test-community)? - [ ] Lint passing (
make lintthis requires golangci-lint)?
Some other common XML files that can probably be safely excluded:
"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file jsr305_annotations/Jsr305_annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
https://github.com/smlbiobot/cr/blob/master/apk/2.0.1/com.supercell.clashroyale-2.0.1.decoded/unknown/third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml https://github.com/google/guava/blob/master/guava-gwt/src/com/google/common/annotations/Annotations.gwt.xml https://github.com/Goujer/kanojo_app/blob/72c8374c139e16d528bdbdd16a274851da68d753/app/src/main/jsr305_annotations/Jsr305_annotations.gwt.xml#L3
Actual Filename should also be added to the output.
After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the .apk file using the jadx and then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.
The api key was in the .dex file which was correctly handled :heavy_check_mark:
The
reConstString regex is correct and was detecting the line :heavy_check_mark:
The
parseDexInstructions was correctly handling it :heavy_check_mark:
But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.
The decompiled code
After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the
.apkfile using thejadxand then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.The api key was in the
.dexfile which was correctly handled ✔️
The
reConstStringregex is correct and was detecting the line ✔️
The
parseDexInstructionswas correctly handling it ✔️But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.
The decompiled code
@bugbaba is there anyway you could share that apk file?
@joeleonjr Couldn't find you in the discord server, Please ping ben10_01 on discord or nomanAli181 on twitter
@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all const-string values, so that we don't miss as much. It's still not perfect, but should be better and still performant.
Note to reviewers: Please look at how we handle an error caused by calling h.processAPK(). Specificallly, I thought it would make sense to call newArchiveHandler().HandleFile() if the APK parsing failed. That would ensure any file that is handled as an .apk would still be processed as a .zip in case it was mislabeled by the user (shouldn't happen, but who knows).
It is now able to detect the key for that specific apk file.
But is there any way to avoid having pkg/handlers/apk_keywords.go file? As its become another file to maintain with each detector change.
@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all
const-stringvalues, so that we don't miss as much. It's still not perfect, but should be better and still performant.
We should add a feature flag for this called EnableAPKHandler which would be off by default (empty bool value) in the feature package: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/feature/feature.go
We can turn it on by default in OSS, and when it's imported into Enterprise it will be off by default unless we override it with a feature flag.
Joe, let me know if you want to sync on how this works.
Great job, @joeleonjr! 🥳 This looks like it was quite the project to get working. I’m excited to see it in action—and to handle the inevitable user question about verifying findings we found in their .apk file. 🤣
usage of trufflehog to scanning apk?
usage of trufflehog to scanning apk?
Yes, that's correct. Feel free to check out our blog for additional details.
trufflehog shoul improve regex on findings api keys and others secrets like i test on apk which contains api keys but it failed to get them
The
The
The decompiled code