PINCE
PINCE copied to clipboard
Feature request: Heap scanning with data structure detection
As soon as memory scanning is implemented, an additional feature allowing to detect simple data structures would be great.
For example, one could hook all malloc
calls using the LD_PRELOAD
environment variable in order to detect allocated units and graphically outline this in the memory viewer. Furthermore, if a byte sequence within a block of allocated memory represents a valid heap or stack address, this could be graphically highlighted as a possible pointer.
Thank you for the efforts which you put into this great project.
Sure, why not, looks very useful. But this might be implemented at very last phases because I'm planning to finish debugger&code injection engine first in order to give scanmem team more time to develop libscanmem. Also, there are missing features in libscanmem, I'll try to help developing it when this project reaches code scanning phase. My current plan follows as:
setup.py-->refactoring of libPINCE for OOP usage-->basic debugger-->breakpoints-->code injection(single line then code cave injection)-->signal bypassing&Anti anti-debugger tricks in general-->final GUI tweaks/refactoring-->memory scanning-->pointer scanning&this feature
Thank you for the efforts which you put into this great project
Well, someone had to get the boulder rolling :smiley:
The data of hooking all malloc()s is huge and backtracing takes quite some time. So if you want to do this, then you should know what to look for and filter before backtracing. Otherwise, real-time libs like openGL notice a problem and exit the game. ugtrain already has dynamic memory discovery/hacking/adaption based on malloc() hooking and LD_PRELOAD. It has awesome Chromium B.S.U., Cube 2: Sauerbraten and Warzone 2100 examples based on this.
Thanks @sriemer, I'll keep that in mind. Also I have a few concerns about LD_PRELOAD trick. Firstly, you have to restart the game, which is a huge drawback on games that has different state saving mechanisms(some games even disallow you from quitting, check OneShot rpg for instance), we should find a runtime solution for that. Secondly, some games have protected binary loaders and they might detect libraries loaded by LD_PRELOAD easily by checking /proc/$pid/maps for non-trusted paths.
I made a pointer scanner, no need to rely on LD_PRELOAD, debuger and hook, it will not be detected by the game, only need a memory dump file, and then the game does not even need to run. Maybe it will help you: https://github.com/scanmem/scanmem/issues/431
@kekeimiku That looks very cool! But integrating it into PINCE is a bit unlikely since it's a direct extension of the scanmem functionality and it feels like it should be integrated into scanmem instead
If you would like to integrate it as a 3rd party tool, maybe we could look into changing PointerSearcher-X output format to PINCE cheat table format so they would be compatible. If you are up for it, I can create a new issue with detailed info on the format for this kind of integration. It's up to you
@korcankaraokcu
The PINCE cheat table doesn't seem to support resolving something like libhello+0x1234
as a base address?
PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function parse_and_eval
to evaluate anything you give it to but apparently it doesn't support resolving shared libraries
More info on the symbols and gdb expressions: https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions
Maybe the command info sharedlibrary
could be used for this purpose. I'd either have to extend examine_expression
functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parse pmap
output to find base addresses. Which method would you like to proceed with?
PINCE uses gdb in the background for symbol resolving and gdb supports symbols such as function names. You also have to stop the process to use any gdb functionality. PINCE internally uses the gdb API function
parse_and_eval
to evaluate anything you give it to but apparently it doesn't support resolving shared librariesMore info on the symbols and gdb expressions: https://github.com/korcankaraokcu/PINCE/wiki/About-GDB-Expressions
Maybe the command
info sharedlibrary
could be used for this purpose. I'd either have to extendexamine_expression
functionality or create a new function specifically for this purpose. If you would like to implement this on your own without any debugger interference, you can also parsepmap
output to find base addresses. Which method would you like to proceed with?
I think parsing /proc/pid/maps
is more efficient. We only need to find the first memory area named xxx
with read permission and get its start address.
The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table
I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address
Yeah I agree, PINCE already uses a package called psutil
for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:
PINCE stores cheat tables in pct
extension. Save button trigger is handled by pushButton_Save_clicked
In PINCE.py
. It calls read_address_table_recursively
which reads the entire table. The function responsible for item conversion is read_address_table_entries
. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:
[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]
Save this as a pct file and load it in PINCE. You can also view it in here for clarity
Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by [2, 10, true, 0]
. You can copy paste this for now, I can also explain it further if you wish. Any questions?
The question was more about which project should implement symbol resolving for shared libraries. But on the second thought, it makes sense for PINCE to have this functionality because otherwise you'd have to launch PointerSearcher everytime to create a new cheat table
I think parsing /proc/pid/maps is more efficient. We only need to find the first memory area named xxx with read permission and get its start address
Yeah I agree, PINCE already uses a package called
psutil
for parsing this kind of information. It could be done via that. I'll be looking into this soon. Meanwhile, you can work on converting pointer search results into cheat tables. Here's a detailed explanation of the cheat table format:PINCE stores cheat tables in
pct
extension. Save button trigger is handled bypushButton_Save_clicked
InPINCE.py
. It callsread_address_table_recursively
which reads the entire table. The function responsible for item conversion isread_address_table_entries
. This function serializes items and makes them ready for copying or turning them into a cheat table. This function basically returns a list of description, address_expr, value_type. I'll explain further with an example. Below is a cheat table that contains two pointers:
[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], []]]
Save this as a pct file and load it in PINCE. You can also view it in here for clarity
Both entries have "No Description" as their description. First entry has the base address of "0x561be37b2529" and only one offset, which is 12 (0xC). Second entry has the base address of "0x561be37cb604" and it has two offsets, 4 and 32 in that order. Both entries have the Int32 type which is indicated by
[2, 10, true, 0]
. You can copy paste this for now, I can also explain it further if you wish. Any questions?
Why is int32
indicated by [2, 10, true, 0]
? what other types are indicated by? [2, 10, true, 0], []]]
what is the last empty array?
Because that first array is the value_type
representation in the json format.
The first value in the array is the VALUE_INDEX
which you can find in libpince/type_defs.py
at line 157.
Because that first array is the
value_type
representation in the json format.
The first value in the array is the
VALUE_INDEX
which you can find inlibpince/type_defs.py
at line 157.
Thx
@brkzlr Thanks for the explanation. I'll add a little more information on this
value_index: Type of the value
length: Length of the entry, only used if the entry has length, defaults to 10
zero_terminate: Determines if the string is zero terminated, only used for strings
value_repr: Representation of the value, can be found in type_defs.py
. Determines if the value is being shown as unsigned, signed or hexadecimal
what is the last empty array?
It's the children of the entry. The table has the structure of a tree. The one I sent you is basically a list, so it has no child entries. The table below has an entry that has exactly one child. Load it in PINCE and observe for yourself:
[["No Description", ["0x561be37b2529", [12]], [2, 10, true, 0], []], ["No Description", ["0x561be37cb604", [4, 32]], [2, 10, true, 0], [["No Description", "printf", [2, 10, true, 1], []]]]]
@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance, kwidgetsaddons5_qt.qm
. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?
@kekeimiku I've realized something about memory pages while working on your request. Not everything is a so file, there are multiple pages with different file extensions. For instance,
kwidgetsaddons5_qt.qm
. Do you want me to include everything or just so files? Which pages do you exactly search for while searching for pointers?
Currently pointer searches only care about regions that have read permission
and path does not contain /usr, /dev
and meet the following rules [stack]
[heap]
path is binary
path is empty
.
For pince, you only need to search the first elf
file with the specified name in /proc/pid/maps
according to the input, and then get its starting address.
Example: maps
0x200001-0x3000008 r-- /home/aabb/hihihi
...
0x300001-0x4000008 r-- /home/aabb/hello.so
0x4000008-0x3000008 rw- /home/aabb/hello.so
Output of pointersearch hello.so+0x1
It should be parsed as 0x300002
. That is 0x300001+0x1
Output of pointersearch hihihi+0x1
It should be parsed as 0x200002
. That is 0x200001+0x1
My English is terrible/bad. please feel free to contact me if anything is unclear.
My English is terrible/bad. please feel free to contact me if anything is unclear
Your English is very clear, don't worry
path is empty
But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?
My English is terrible/bad. please feel free to contact me if anything is unclear
Your English is very clear, don't worry
path is empty
But, how are we going to reference such region? As I understand, we are going to parse the path and get the library name. If there's no path, how are we supposed to reference it? Did I miss something? Or did you mean to exclude those?
If there is no path, we can ignore it. Can return an error if an elf
named xxx
cannot be found.
So, do we exclude those rules then? I mean, ignore if [stack]
[heap]
path is binary
path is empty
So, do we exclude those rules then? I mean, ignore if
[stack]
[heap]
path is binary
path is empty
We only need areas where the pathname is binary file. others can be ignored.
Aight, thanks for clearing it up
How do you feel about doing this in pointersearch, then just call scanmem/pointersearch.
I mean resolve the address of the pointer chain.
Maybe we can move all pointer search related functions to scanmem, pince only needs to focus on scanmem.
How do you feel about doing this in pointersearch, then just call scanmem/pointersearch
Users will eventually want to use .so symbols in their scripts, it makes sense for libpince to have this kind of symbol recognition. Don't worry, I'll most likely finish this by tomorrow. I was focused on some visual bugs that I noted in the past but I'm done with them now
I've finished it but need to optimize it a bit before releasing, sorry for the delay
Aight, I've finished it. Enjoy using this new feature. psutils
was a bit slower than I've expected, 30ms on the first call, a bit slow for what it is. I can also parse by myself if this becomes a problem in the future or if we don't use extras of psutils
There's one caveat about this feature. examine_expression
handles all of the symbol recognition, this new feature was implemented inside of it because it makes sense design wise. However, examine_expression
uses gdb to resolve symbols so you'll have to stop the process in order to use this feature. I'll try to change the behavior of PINCE in the near future to make it usable even when process isn't stopped
How does pince resolve pointer chains? Seems to be different than expected.
For example [["No Description", ["0x7f08fa222050", [0, 24, 16]], [2, 10, true, 0], []]]
It is expected that it should read a "ptr1" from "0x7f08fa222050+0", then "ptr2" from "ptr1+24", and finally "ptr2+16" to the target.
For example:
proc = OpenProcess(pid)
base_address = 0x7f08fa222050
buf = [0;8] //A 8-byte pointer-sized buf
proc.read(buf, base_address + 0) // read 8 bytes from `base_address + 0`
ptr1 = uint64(buf) // convert 8 bytes of buf to uint64
proc.read(buf, ptr1 + 24) // read 8 bytes from `ptr1 + 24`
ptr2 = uint64(buf) // convert 8 bytes of buf to uint64
target = ptr2 + 16
This conversation has been moved to discord to not derail the original subject further