binaryninja-api icon indicating copy to clipboard operation
binaryninja-api copied to clipboard

Add Outlining Support for strcat

Open 0xdevalias opened this issue 1 year ago • 11 comments

What is the feature you'd like to have?

Currently, in the binary I am looking at, there are a number of string operations that are treated as raw hex numbers applied to variable offsets. It would be nice if these showed in a more 'as it would appear in the code' way, of just general string operations.

Is your feature request related to a problem?

There are other similar examples in the binary (macOS, x86_64) I'm looking at, but this is the most self contained one:

  005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
  005851b5      int64_t rax = *___stack_chk_guard
  005851d4      void var_70
  005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
  005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
  005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
  005851f9          _CFRelease(rax_2)
  00585201          int64_t rax_3 = _strlen(arg2)
🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
  00585239      int64_t rax_4 = *___stack_chk_guard
  00585240      if (rax_4 != rax)
  0058524b          ___stack_chk_fail()
  0058524b          noreturn
  0058524a      return rax_4

If I right click -> Display As -> Character Constant on those hex constants, it looks a bit more readable:

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
    00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          *(arg2 + rax_3) = '/com.xfe'
+ 🐛0058521e          *(arg2 + rax_3 + 8) = 'rrecords'
+ 🐛0058522d          *(arg2 + rax_3 + 0x10) = '.serum/'
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4

But it would be nicer if it was automatically something more like this (or similar):

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
    00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          *(arg2 + rax_3) = '/com.xferrecords.serum/'
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4

Or even better would be:

    005851a0  int64_t SerumGUI::getPath_AppDataBase(int64_t arg1, int64_t* arg2)
    005851b5      int64_t rax = *___stack_chk_guard
    005851d4      void var_70
    005851d4      if (_FSFindFolder(0xffff8005, 0x61737570, 0, &var_70) == 0)
    005851dc          int64_t rax_2 = _CFURLCreateFromFSRef(0, &var_70)
    005851f1          _CFURLGetFileSystemRepresentation(rax_2, 0, arg2, 0x200)
    005851f9          _CFRelease(rax_2)
-   00585201          int64_t rax_3 = _strlen(arg2)
- 🐛00585210          *(arg2 + rax_3) = 0x6566782e6d6f632f
- 🐛0058521e          *(arg2 + rax_3 + 8) = 0x7364726f63657272
- 🐛0058522d          *(arg2 + rax_3 + 0x10) = 0x2f6d757265732e
+ 🐛00585210          _strcat(arg2, "/com.xferrecords.serum/")
    00585239      int64_t rax_4 = *___stack_chk_guard
    00585240      if (rax_4 != rax)
    0058524b          ___stack_chk_fail()
    0058524b          noreturn
    0058524a      return rax_4
  • https://www.geeksforgeeks.org/strcat-in-c/
    • char *strcat(char *dest, const char *src);

Are any alternative solutions acceptable?

Unsure.

Additional Information:

image

image

image

image

image

  • This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations

0xdevalias avatar Jun 03 '24 08:06 0xdevalias

I think the issue here is our stack string detection does not (yet) work on offsets into variables. This is a nice feature request, thx!

xusheng6 avatar Jun 03 '24 08:06 xusheng6

@0xdevalias could you please provide the binary so that it would be easier for us to work on this issue?

xusheng6 avatar Jun 03 '24 09:06 xusheng6

could you please provide the binary so that it would be easier for us to work on this issue

@xusheng6 I'll see if the same thing is present in the demo version, and if so, can share that (the exact specifics may differ slightly from above screenshots, but should otherwise be fine)

If so, will share via slack, and then update here.

0xdevalias avatar Jun 05 '24 04:06 0xdevalias

I'll see if the same thing is present in the demo version

@xusheng6 Looks like it is:

image

Uploaded on slack here:

  • https://binaryninja.slack.com/archives/C0CVALTLN/p1717563492495739?thread_ts=1717398271.612389&cid=C0CVALTLN

0xdevalias avatar Jun 05 '24 04:06 0xdevalias

This is a good test case for strcat. We have a general issue as well (#3349) but will leave this one open for this specific function.

bpotchik avatar Jun 05 '24 15:06 bpotchik

Linking to this tangentially related issue as well for visibility:

  • https://github.com/Vector35/binaryninja-api/issues/2185

0xdevalias avatar Jun 07 '24 07:06 0xdevalias

@0xdevalias Would you mind please re-uploading the binary to https://portal.binary.ninja/ ? (The uploaded file on slack has expired.)

galenbwill avatar May 21 '25 18:05 galenbwill

Would you mind please re-uploading the binary to portal.binary.ninja ?

@galenbwill Skimming through what I included in the original post, I think I know which binary it was and should still have access to it.

Though I can't seem to see where in the portal I should be able to submit it?

0xdevalias avatar May 22 '25 05:05 0xdevalias

I have the file already. No need to upload.

bpotchik avatar May 22 '25 14:05 bpotchik

How does this look?

Image

bpotchik avatar May 22 '25 19:05 bpotchik

@bpotchik Took my pre-☕ 💤 🧠 a second to re-acquaint and parse what was going on here, but going from my original:

Image

To:

Image

Is a definite win ✅

And then if I'm inferring correctly, I assume that is in a lower level IL representation; and then the bottom right corner of your screenshot shows it getting improved again in High Level IL to:

Image

Which is also very nice ✅


@bpotchik Also, curious (and may be irrelevant now since the above likely eliminates this level of detail/view anyway), but have there been any additions/changes/improvements that would address/simplify this aside I mentioned:

This last screenshot shows another thing that would be nice to improve (which maybe is worth opening as a separate issue), where 32 bit shifts are also being used within the string operations

I'm not sure if that specific function is in the demo binary I uploaded though.. will check.

Edit: Unfortunately it seems to be in a different version of the binary, I can probably upload that one if I knew where that service was (I couldn't find it when I looked earlier)

To demonstrate it in a potentially more useful form than the original screenshot though; here is a copy/paste out of Binary Ninja:

_strlen/etc results being shifted
005cc360   if (_FSFindFolder(0xffff8005, 'pusa', 0, &foundRef, zx.o(0)) == 0)
005cc46a       int64_t foundRefUrl = _CFURLCreateFromFSRef(0, &foundRef)  // Creates a URL from a given directory or file.
005cc484       // CFURLGetFileSystemRepresentation(
005cc484       //   url: CFURL!,
005cc484       //   resolveAgainstBase: Bool,
005cc484       //   buffer: UnsafeMutablePointer<UInt8>!,
005cc484       //   maxBufLen: CFIndex
005cc484       // ) -> Bool
005cc484       // 
005cc484       // Fills a buffer with the file system's native string representation of a given URL's path.
005cc484       _CFURLGetFileSystemRepresentation(foundRefUrl, 0, thisSerumDSP->presetFilename, 512)
005cc48c       // void CFRelease(CFTypeRef cf);
005cc48c       // 
005cc48c       // Releases a Core Foundation object.
005cc48c       _CFRelease(foundRefUrl)
005cc491       char* presetFilename_4 = thisSerumDSP->presetFilename
🐛005cc49c      int64_t presetFilename_4Length = _strlen(presetFilename_4)
🐛005cc4ab      *(presetFilename_4 + presetFilename_4Length) = '/com.xfe'
🐛005cc4b9      *(presetFilename_4 + presetFilename_4Length + 8) = 'rrecords'
🐛005cc4c8      *(presetFilename_4 + presetFilename_4Length + 16) = '.serum/u'
🐛005cc4d7      *(presetFilename_4 + presetFilename_4Length + 24) = 'ser.dat'
005cc4eb       int64_t userDatFileHandle = _fopen(thisSerumDSP->presetFilename, "wb")
005cc4f6       _rewind(userDatFileHandle)
005cc503       __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc503       
005cc522       if (_fwrite(&tildeStr, 0x10, 1, userDatFileHandle) u> 0xe)
005cc531           _fclose(userDatFileHandle)
005cc522       else
005cc527           SerumDSP::InitStuff(thisSerumDSP)
005cc360   else
005cc376       _strcpy(thisSerumDSP->presetFilename, thisSerumDSP->_maybeSerumPresetsPath)
005cc37b       char* presetFilename_2 = thisSerumDSP->presetFilename
005cc386       int64_t presetFilename_2Length = _strlen(presetFilename_2)
🐛005cc395      *(presetFilename_2 + presetFilename_2Length) = '/System/'
🐛005cc3a3      *(presetFilename_2 + presetFilename_2Length + 8) = 'utermmat'
🐛005cc3a8      presetFilename_2[presetFilename_2Length + 16] = '\x00'
005cc3ad       char* presetFilename_3 = thisSerumDSP->presetFilename
🐛005cc3bd      int64_t presetFilenameLength = _strlen(presetFilename_3) << 32  // A << 32 bit shift on a 64-bit integer moves the original 32-bit length value from the lower 32 bits to the upper 32 bits of the 64-bit integer
🐛005cc3d2      presetFilename_3[(-30064771072 + presetFilenameLength) s>> 32] = 's'  // -30064771072 >> 32 == -7
🐛005cc3d6      thisSerumDSP->presetFilename[(-17179869184 + presetFilenameLength) s>> 32] = '.'  // -17179869184 >> 32 == -4
🐛005cc3f3      thisSerumDSP->presetFilename[(-12884901888 + presetFilenameLength) s>> 32] = 'd'  // -12884901888 >> 32 == -3
005cc41f       // thisSerumDSP->presetFilename = /System/utermmat
005cc41f       // 
005cc41f       // After replacing length(16)-7 with s: /System/usermmat
005cc41f       // After replacing length(16)-4 with .: /System/user.mat
005cc41f       // After replacing length(16)-3 with .: /System/user.dat
005cc41f       int64_t rax_40 = _fopen(thisSerumDSP->presetFilename, "wb")
005cc42f       __builtin_strncpy(dest: &tildeStr, src: "~~~~~~~~~~~~~~~~", n: 16)
005cc42f       
005cc44e       if (_fwrite(&tildeStr, 16, 1, rax_40) u<= 14)
005cc457           SerumDSP::InitStuff(thisSerumDSP)
005cc457       
005cc531       _fclose(rax_40)

0xdevalias avatar May 22 '25 22:05 0xdevalias

Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.

A general update for this issue. I've updated the outlining feature to perform string reconstruction for cases like this. For now, outlining will result in a builtin_strcpy + strlen, or some similar combination. So now we will at least reconstruct the string properly and display it.

I have some POC implementation for outlining strcpy/strlen to strcat, although I've currently held it back until the implementation matures some more.

bpotchik avatar Jul 03 '25 14:07 bpotchik

Yeah I'm not sure if any updates I made would specifically help with the shifts. If I had an example binary I would be more than willing to spend some time looking into it.

@bpotchik Example binary uploaded as clever river explores happily; and I also created a new issue to more specifically track the 32 bit shift aspect of things:

  • https://github.com/Vector35/binaryninja-api/issues/7044

0xdevalias avatar Jul 04 '25 04:07 0xdevalias