binaryninja-api
binaryninja-api copied to clipboard
Add Outlining Support for strcat
What is the feature you'd like to have?
Currently, in the binary I am looking at, there are a number of string operations that are treated as raw hex numbers applied to variable offsets. It would be nice if these showed in a more 'as it would appear in the code' way, of just general string operations.
Is your feature request related to a problem?
There are other similar examples in the binary (macOS, x86_64) I'm looking at, but this is the most self contained one:
005851a0 int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
005851b5 int64_t rax = *___stack_chk_guard
005851d4 void var_70
005851d4 if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
005851dc int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
005851f1 _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
005851f9 _CFRelease(rax_2)
00585201 int64_t rax_3 = _strlen(arg2)
🐛00585210 *(arg2 + rax_3) = 0x6566782e6d6f632f
🐛0058521e *(arg2 + rax_3 + 8) = 0x7364726f63657272
🐛0058522d *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
00585239 int64_t rax_4 = *___stack_chk_guard
00585240 if (rax_4 != rax)
0058524b ___stack_chk_fail()
0058524b noreturn
0058524a return rax_4
If I right click -> Display As -> Character Constant on those hex constants, it looks a bit more readable:
005851a0 int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
005851b5 int64_t rax = *___stack_chk_guard
005851d4 void var_70
005851d4 if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
005851dc int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
005851f1 _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
005851f9 _CFRelease(rax_2)
00585201 int64_t rax_3 = _strlen(arg2)
- 🐛00585210 *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210 *(arg2 + rax_3) = '/com.xfe'
+ 🐛0058521e *(arg2 + rax_3 + 8) = 'rrecords'
+ 🐛0058522d *(arg2 + rax_3 + 0x10) = '.serum/'
00585239 int64_t rax_4 = *___stack_chk_guard
00585240 if (rax_4 != rax)
0058524b ___stack_chk_fail()
0058524b noreturn
0058524a return rax_4
But it would be nicer if it was automatically something more like this (or similar):
005851a0 int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
005851b5 int64_t rax = *___stack_chk_guard
005851d4 void var_70
005851d4 if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
005851dc int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
005851f1 _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
005851f9 _CFRelease(rax_2)
00585201 int64_t rax_3 = _strlen(arg2)
- 🐛00585210 *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210 *(arg2 + rax_3) = '/com.xferrecords.serum/'
00585239 int64_t rax_4 = *___stack_chk_guard
00585240 if (rax_4 != rax)
0058524b ___stack_chk_fail()
0058524b noreturn
0058524a return rax_4
Or even better would be:
005851a0 int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
005851b5 int64_t rax = *___stack_chk_guard
005851d4 void var_70
005851d4 if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
005851dc int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
005851f1 _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
005851f9 _CFRelease(rax_2)
- 00585201 int64_t rax_3 = _strlen(arg2)
- 🐛00585210 *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210 _strcat(arg2, "/com.xferrecords.serum/")
00585239 int64_t rax_4 = *___stack_chk_guard
00585240 if (rax_4 != rax)
0058524b ___stack_chk_fail()
0058524b noreturn
0058524a return rax_4
- https://www.geeksforgeeks.org/strcat-in-c/
-
char *strcat(char *dest, const char *src);
-
Are any alternative solutions acceptable?
Unsure.
Additional Information:
- This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations
I think the issue here is our stack string detection does not (yet) work on offsets into variables. This is a nice feature request, thx!
@0xdevalias could you please provide the binary so that it would be easier for us to work on this issue?
could you please provide the binary so that it would be easier for us to work on this issue
@xusheng6 I'll see if the same thing is present in the demo version, and if so, can share that (the exact specifics may differ slightly from above screenshots, but should otherwise be fine)
If so, will share via slack, and then update here.
I'll see if the same thing is present in the demo version
@xusheng6 Looks like it is:
Uploaded on slack here:
- https://binaryninja.slack.com/archives/C0CVALTLN/p1717563492495739?thread_ts=1717398271.612389&cid=C0CVALTLN
This is a good test case for strcat. We have a general issue as well (#3349) but will leave this one open for this specific function.
Linking to this tangentially related issue as well for visibility:
- https://github.com/Vector35/binaryninja-api/issues/2185
@0xdevalias Would you mind please re-uploading the binary to https://portal.binary.ninja/ ? (The uploaded file on slack has expired.)
Would you mind please re-uploading the binary to portal.binary.ninja ?
@galenbwill Skimming through what I included in the original post, I think I know which binary it was and should still have access to it.
Though I can't seem to see where in the portal I should be able to submit it?
I have the file already. No need to upload.
How does this look?
@bpotchik Took my pre-☕ 💤 🧠 a second to re-acquaint and parse what was going on here, but going from my original:
To:
Is a definite win ✅
And then if I'm inferring correctly, I assume that is in a lower level IL representation; and then the bottom right corner of your screenshot shows it getting improved again in High Level IL to:
Which is also very nice ✅
@bpotchik Also, curious (and may be irrelevant now since the above likely eliminates this level of detail/view anyway), but have there been any additions/changes/improvements that would address/simplify this aside I mentioned:
This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations
I'm not sure if that specific function is in the demo binary I uploaded though.. will check.
Edit: Unfortunately it seems to be in a different version of the binary, I can probably upload that one if I knew where that service was (I couldn't find it when I looked earlier)
To demonstrate it in a potentially more useful form than the original screenshot though; here is a copy/paste out of Binary Ninja:
_strlen/etc results being shifted
005cc360 if (_FSFindFolder(0xffff8005, 'pusa', 0, &foundRef, zx.o(0)) == 0)
005cc46a int64_t foundRefUrl = _CFURLCreateFromFSRef(0, &foundRef) // Creates a URL from a given directory or file.
005cc484 // CFURLGetFileSystemRepresentation(
005cc484 // url: CFURL!,
005cc484 // resolveAgainstBase: Bool,
005cc484 // buffer: UnsafeMutablePointer<UInt8>!,
005cc484 // maxBufLen: CFIndex
005cc484 // ) -> Bool
005cc484 //
005cc484 // Fills a buffer with the file system's native string representation of a given URL's path.
005cc484 _CFURLGetFileSystemRepresentation(foundRefUrl, 0, thisSerumDSP->presetFilename, 512)
005cc48c // void CFRelease(CFTypeRef cf);
005cc48c //
005cc48c // Releases a Core Foundation object.
005cc48c _CFRelease(foundRefUrl)
005cc491 char* presetFilename_4 = thisSerumDSP->presetFilename
🐛005cc49c int64_t presetFilename_4Length = _strlen(presetFilename_4)
🐛005cc4ab *(presetFilename_4 + presetFilename_4Length) = '/com.xfe'
🐛005cc4b9 *(presetFilename_4 + presetFilename_4Length + 8) = 'rrecords'
🐛005cc4c8 *(presetFilename_4 + presetFilename_4Length + 16) = '.serum/u'
🐛005cc4d7 *(presetFilename_4 + presetFilename_4Length + 24) = 'ser.dat'
005cc4eb int64_t userDatFileHandle = _fopen(thisSerumDSP->presetFilename, "wb")
005cc4f6 _rewind(userDatFileHandle)
005cc503 __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc503
005cc522 if (_fwrite(&tildeStr, 0x10, 1, userDatFileHandle) u> 0xe)
005cc531 _fclose(userDatFileHandle)
005cc522 else
005cc527 SerumDSP::InitStuff(thisSerumDSP)
005cc360 else
005cc376 _strcpy(thisSerumDSP->presetFilename, thisSerumDSP->_maybeSerumPresetsPath)
005cc37b char* presetFilename_2 = thisSerumDSP->presetFilename
005cc386 int64_t presetFilename_2Length = _strlen(presetFilename_2)
🐛005cc395 *(presetFilename_2 + presetFilename_2Length) = '/System/'
🐛005cc3a3 *(presetFilename_2 + presetFilename_2Length + 8) = 'utermmat'
🐛005cc3a8 presetFilename_2[presetFilename_2Length + 16] = '\x00'
005cc3ad char* presetFilename_3 = thisSerumDSP->presetFilename
🐛005cc3bd int64_t presetFilenameLength = _strlen(presetFilename_3) << 32 // A << 32 bit shift on a 64-bit integer moves the original 32-bit length value from the lower 32 bits to the upper 32 bits of the 64-bit integer
🐛005cc3d2 presetFilename_3[(-30064771072 + presetFilenameLength) s>> 32] = 's' // -30064771072 >> 32 == -7
🐛005cc3d6 thisSerumDSP->presetFilename[(-17179869184 + presetFilenameLength) s>> 32] = '.' // -17179869184 >> 32 == -4
🐛005cc3f3 thisSerumDSP->presetFilename[(-12884901888 + presetFilenameLength) s>> 32] = 'd' // -12884901888 >> 32 == -3
005cc41f // thisSerumDSP->presetFilename = /System/utermmat
005cc41f //
005cc41f // After replacing length(16)-7 with s: /System/usermmat
005cc41f // After replacing length(16)-4 with .: /System/user.mat
005cc41f // After replacing length(16)-3 with .: /System/user.dat
005cc41f int64_t rax_40 = _fopen(thisSerumDSP->presetFilename, "wb")
005cc42f __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc42f
005cc44e if (_fwrite(&tildeStr, 16, 1, rax_40) u<= 14)
005cc457 SerumDSP::InitStuff(thisSerumDSP)
005cc457
005cc531 _fclose(rax_40)
Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.
A general update for this issue. I've updated the outlining feature to perform string reconstruction for cases like this. For now, outlining will result in a builtin_strcpy + strlen, or some similar combination. So now we will at least reconstruct the string properly and display it.
I have some POC implementation for outlining strcpy/strlen to strcat, although I've currently held it back until the implementation matures some more.
Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.
@bpotchik Example binary uploaded as clever river explores happily; and I also created a new issue to more specifically track the 32 bit shift aspect of things:
- https://github.com/Vector35/binaryninja-api/issues/7044