volatility3 icon indicating copy to clipboard operation
volatility3 copied to clipboard

Symbol Tables: additional APIs

Open Abyss-W4tcher opened this issue 3 months ago • 4 comments

Hi 👋,

This PR introduces two additional APIs for symbol tables :

  • symbols_as_dict : get the entire symbol table as a dictionary (name:symbol_object). It is useful to quickly iterate on all the values, and faster than doing [symbol_table.get_symbol(symbol) for symbol in symbol_table.symbols]
  • update_symbol_address : update a symbol address in the symbol table. This is only applicable to the context, and does not modify the ISF. It also takes care of invalidating the cache.

I will use these features in a future Pull Request, related to macOS, to circumvent a new format in the kernel that slides some symbols with an additional offset.

I look forward to your comments and reviews !

Abyss-W4tcher avatar Mar 18 '24 23:03 Abyss-W4tcher

symbols_as_dict would be ok (but might be overkill), but update_symbol is very bad. It could lead to two runs with the same context and symbol tables giving different results. It also means that saving a configuration that contains the symbol table and rerunning it could also give different results.

If you could provide more rationale about why you need to alter the symbol in the symbol table (as opposed to just getting a symbol and applying a transform to its value) it would help? I think an adaptor that returns different (consistent) results for a symbol would be better than this?

ikelos avatar Mar 20 '24 21:03 ikelos

Being able to fetch all symbols with a single endpoint is convenient, and more performant, even if it is close to the symbols API indeed 👍.

Regarding the symbol updating feature, for my needs, the shifting applied is dependent on a custom Intel layer requirements, which are saved to a config when this custom layer is needed. The shifting is automatically applied, only when the concerned kernel SymbolTable is instantiated, and can be toggled on/off if a user wants to create a kernel SymbolTable clone without this shift, for whatever reason :

https://github.com/volatilityfoundation/volatility3/blob/55dd39f2ba60ffdd2126b7ea011940f0df42815a/volatility3/framework/symbols/mac/init.py#L11

To be concise, part of the symbols (almost half) in new macOS kernels are shifted with a different slide than KASLR. I developed a logic that circumvent this problem (I have one symbol, should I use KASLR slide or the other slide to translate it ?), by sliding the "out of place" symbols addresses in a position reachable by KASLR. Of course, this will be fully explained in the related PR.

I thought of a get_symbol override, but this would need to be macOS specific, and only if certain requirements are met. This would need changes in places that shouldn't know about layer types (volatility3/framework/symbols/intermed.py) ?

Apart of this specific need, this feature also tries to answer a part of the following TODO :

https://github.com/volatilityfoundation/volatility3/blob/55dd39f2ba60ffdd2126b7ea011940f0df42815a/volatility3/framework/symbols/intermed.py#L396

Abyss-W4tcher avatar Mar 21 '24 08:03 Abyss-W4tcher

I see, thanks for the explanation. The reason for my caution is because we tried this tactic originally with the ASLR shifting (so a whole symbol table was shifted) and it ran into all the problems of reproducability I mentioned above. It cost a major version bump to remove all the symbol_shift functionality from core, so that's why I want to think through the solution very carefully before pushing ahead with it.

https://github.com/volatilityfoundation/volatility3/commit/8f1c5ee55c040e7c748c95ce8fff6b19992c954c

This was the reason for introducing modules which can then have a specific shift applied to them at module load time, so the symbols stay the same, but the module can be offset by a different amount. The module offset allows symbols to be shifted but only when they're accessed through the module. That way the core values don't change and it's clearer that there's a layer in the middle (the module) messing with the results. I'd far prefer a second module at a different offset be constructed using the same symbol table, than that an existing symbol table have it symbols tinkered with. The comment about being able to change symbols probably needs removing since on the whole, the experiment in 2021 proved that changing them from the original data is a tricky and dangerous tactic to take.

By creating a module with an offset, you get a unique unit that can shift all symbols within it, and record them for the necessary changes for the configuration. I'm not sure how we ensure there's two symbol tables for Macs, but I'd much prefer to research down that path than the one presented above...

https://github.com/volatilityfoundation/volatility3/blob/55dd39f2ba60ffdd2126b7ea011940f0df42815a/volatility3/framework/contexts/init.py#L256

ikelos avatar Mar 21 '24 10:03 ikelos

If you had a previous experience with this setup, we can definitely think about something else, indeed. The intermediate "module" object seems less destructive 👍.

Keeping a unique "kernel" module, with two potential offsets to choose from automatically, or having two different modules for each offset. But how do users know which one to use, while also keeping it transparent to existing plugins ?

Badly representing it, inside object and object_from_symbol (I might have missed other APIs) would be something like :

if not absolute and OTHER_MAC_SPECIFIC_CONDITION:
     slide = ADDITIONAL_LOGIC_TO_DETERMINE_SLIDE_TO_USE(symbol)
     offset += slide
elif not absolute:
      offset += self._offset

Might not be what you meant, because of :

https://github.com/volatilityfoundation/volatility3/blob/55dd39f2ba60ffdd2126b7ea011940f0df42815a/volatility3/framework/interfaces/context.py#L146


My goal is to keep every existing APIs (input, output), and only adding necessary new API functions, FYI, I already have the logic to determine with which slide a symbol should be shifted, and determining the ""two separate symbol tables"", but this is not the point of this PR.

Abyss-W4tcher avatar Mar 21 '24 11:03 Abyss-W4tcher

Well, the idea would be you'd have something like:

module_normal = module("symbol_table", offset = first_offset)
module_weird = module("symbol_table", offset = second_offset)

and then you can do

normal_symbol = module_normal.get_symbol("onethat'snormal")
weird_symbol = module_weird.get_symbol("symbolwiththeweirdoffset")

You could make a separate holder object that figures out which module to use but it's much more explicit so easier for a user to trace back and debug, and it doesn't require messing with any of the existing internals, and critically, the user knows they have to keep track of symbols that come from different modules.

If they just do context.get_symbol("one that's normal") it'll still have the originally defined offset, just as would happen for with ASLR. This would keep with existing APIs completely and shouldn't require changing anything?

ikelos avatar Mar 21 '24 17:03 ikelos

Being explicit is obiously better for tracing and debugging ! However, users cannot predict which slide to use for a symbol, or this would rely on general knowledge but shouldn't be considered. Also, most plugins start with kernel = context.modules[self.config("kernel")] and then it just goes on, and devs don't really specify the KASLR offset by hand and just use this module accross functions ?

About context.get_symbol(), it doesn't change anything, as you said, because ASLR isn't applied anyway.


For example, if a plugin, made back in the day, was using the version_major symbol against 10.9.3 macOS kernels, it would now point to the incorrect data against a 14.3 macOS kernel today. This is also why the automagic wasn't working anymore.

Abyss-W4tcher avatar Mar 21 '24 18:03 Abyss-W4tcher

Sure, and I'm happy for you to provide an extra class that can make the decisions for them, but it needs to be extra and something they can clearly read through, so the rest of volatility works fine without. In essence, I'd expect you to create a new MacOS smart kernel module, which under the hood hold two different symbol tables with different shifts, and proxies calls like a normal kernel to the right symbol table, that then doesn't require any messing with core, and is fully explorable by people trying to figure out what's going on and why.

Since I believe there's automagic for making the kernel module that plugins request, we can update that bit to provide your new kernel module in its place. Would that work, and does it make sense?

ikelos avatar Mar 22 '24 17:03 ikelos

Thanks for the guidelines ! I will close this PR now, even if symbols_as_dict might be useful in the future. If anyone needs it, feel free to comment here again...

Abyss-W4tcher avatar Mar 23 '24 16:03 Abyss-W4tcher