Linux basic support for scanned tasks
Hello!
This PR adds some basic support for tasks found via scanning to some of the existing linux plugins. I've not done them all as I wanted views on this first. Is this a good way to do it? Would it be better to allow users to provide a physical address to the plugins rather than using the linux.psscan plugn?
It would help work towards 924.
I've added them in slightly different ways to the existing plugins, perhaps one method is the best, or perhaps it doesn't really matter.
Here is an example of what this would allow, using the linux-sample-5 file, (sha1: d1edf3635f2726033a81fab12364e03a111bba74).
First see that pid 2282 is not found with the normal pslist plugin:
python vol.py -f linux-sample-5.dmp linux.pslist --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
OFFSET (V) PID TID PPID COMM File output
Therefore can't be used in lsof etc, as this pid isn't found
python vol.py -f linux-sample-5.dmp linux.lsof --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID Process FD Path
<no results>
python vol.py -f linux-sample-5.dmp linux.elfs --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID Process Start End File Path File Output
<no results>
python vol.py -f linux-sample-5.dmp linux.psaux --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID PPID COMM ARGS
<no results>
However it is found with psscan:
$ python vol.py -f linux-sample-5.dmp linux.psscan --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
OFFSET (P) PID TID PPID COMM EXIT_STATE
0x1d24c7c0 2262 2282 2257 apache2 TASK_RUNNING
So by passing the --scan option to the plugins that have been modified, lsof, psaux, and elfs mean that the results for this task are displayed:
lsof:
python vol.py -f linux-sample-5.dmp linux.lsof --scan --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID Process FD Path
2282 apache2 0 /dev/null
2282 apache2 1 /dev/null
2282 apache2 2 /var/log/apache2/error.log
2282 apache2 3 socket:[6225]
2282 apache2 4 socket:[6226]
2282 apache2 5 pipe:[6237]
2282 apache2 6 pipe:[6237]
2282 apache2 7 /var/log/apache2/other_vhosts_access.log
2282 apache2 8 /var/log/apache2/access.log
2282 apache2 10 anon_inode:[1518]
psaux:
python vol.py -f linux-sample-5.dmp linux.psaux --scan --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID PPID COMM ARGS
2282 2257 apache2 /usr/sbin/apache2 -k start
elfs:
python vol.py -f linux-sample-5.dmp linux.elfs --scan --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
PID Process Start End File Path File Output
2282 apache2 0x7fd330a17000 0x7fd330a2c000 /lib/x86_64-linux-gnu/libgcc_s.so.1 Disabled
2282 apache2 0x7fd33e45a000 0x7fd33e465000 /lib/x86_64-linux-gnu/libnss_files-2.13.so Disabled
2282 apache2 0x7fd33e666000 0x7fd33e670000 /lib/x86_64-linux-gnu/libnss_nis-2.13.so Disabled
2282 apache2 0x7fd33e871000 0x7fd33e886000 /lib/x86_64-linux-gnu/libnsl-2.13.so Disabled
2282 apache2 0x7fd33ea89000 0x7fd33ea90000 /lib/x86_64-linux-gnu/libnss_compat-2.13.so Disabled
2282 apache2 0x7fd33ec91000 0x7fd33ec95000 /usr/lib/apache2/modules/mod_status.so Disabled
2282 apache2 0x7fd33ee97000 0x7fd33ee9a000 /usr/lib/apache2/modules/mod_setenvif.so Disabled
2282 apache2 0x7fd33f09b000 0x7fd33f09e000 /usr/lib/apache2/modules/mod_reqtimeout.so Disabled
2282 apache2 0x7fd33f29f000 0x7fd33f2a7000 /usr/lib/apache2/modules/mod_negotiation.so Disabled
<SNIP>
Sorry - as soon as I submitted this I wanted to double check that this apache task wasn't actually a thread and that's why lsof, etc can't see it.
It turns out it is, so when passing the --pid filter to them they can't find the task due to how pid v tgid is displayed with different plugins. e.g. this issue https://github.com/volatilityfoundation/volatility3/issues/981
So it's not the ability to scan for tasks that is helping find these extra bits of information, they would have been displayed if no pid filter was given to the existing plugins. I need to find a good example where scanning for tasks is actually finding more information before this should be looked at.
Hey @eve-mem! there is still something weird here.
In your first output:
$ python vol.py -f linux-sample-5.dmp linux.pslist --pid 2282
Volatility 3 Framework 2.5.1
Progress: 100.00 Stacking attempts finished
OFFSET (V) PID TID PPID COMM File output
The create_pid_filter function filters by task.pid, so even if 2282 is a TID, it should appear there.
The another estrange thing of your report is that currently linux.psscan doesn't accept any argument. Have you modified your code to do it? Not sure why you didn't have the following error:
$ python3 vol.py -f linux-sample-5.dmp linux.psscan --pid 2282
Volatility 3 Framework 2.5.2
usage: volatility [-h] [-c CONFIG] [--parallelism [{processes,threads,off}]] [-e EXTEND] [-p PLUGIN_DIRS] [-s SYMBOL_DIRS] [-v]
[-l LOG] [-o OUTPUT_DIR] [-q] [-r RENDERER] [-f FILE] [--write-config] [--save-config SAVE_CONFIG] [--clear-cache]
[--cache-path CACHE_PATH] [--offline] [--single-location SINGLE_LOCATION] [--stackers [STACKERS ...]]
[--single-swap-locations [SINGLE_SWAP_LOCATIONS ...]]
plugin ...
volatility: error: unrecognized arguments: --pid 2282
Could you please double-check all this again?
Hello @gcmoreira - thanks for taking a look at this - even while it was marked as draft - I really appreciate it.
The --pid option for psscan is added in this PR also which is why it isn't working for you there, sorry for that confusion.
For 2282 not appearing in pslist, it's because it's a thread. e.g. filtering for 2262 and you will see it there.
$ python vol.py -f linux-sample-5.dmp linux.pslist --pid 2262 --threads
Volatility 3 Framework 2.5.2
Progress: 100.00 Stacking attempts finished
OFFSET (V) PID TID PPID COMM File output
0x88001b7ff140 2262 2262 2257 apache2 Disabled
0x88001b5e7840 2262 2267 2257 apache2 Disabled
0x88001ec540c0 2262 2268 2257 apache2 Disabled
0x88001b5fd880 2262 2269 2257 apache2 Disabled
0x880019c618c0 2262 2270 2257 apache2 Disabled
0x88001ee08140 2262 2271 2257 apache2 Disabled
0x88001b7ff840 2262 2272 2257 apache2 Disabled
0x88001f6360c0 2262 2273 2257 apache2 Disabled
0x88001ec25080 2262 2274 2257 apache2 Disabled
0x88001f4fa800 2262 2275 2257 apache2 Disabled
0x88001d2438c0 2262 2276 2257 apache2 Disabled
0x88001d2431c0 2262 2277 2257 apache2 Disabled
0x88001d246740 2262 2278 2257 apache2 Disabled
0x88001d246040 2262 2279 2257 apache2 Disabled
0x88001d249780 2262 2280 2257 apache2 Disabled
0x88001d249080 2262 2281 2257 apache2 Disabled
0x88001d24c7c0 2262 2282 2257 apache2 Disabled <---- here it is
0x88001d24c0c0 2262 2283 2257 apache2 Disabled
0x88001d24f800 2262 2284 2257 apache2 Disabled
0x88001d24f100 2262 2285 2257 apache2 Disabled
0x88001d253840 2262 2286 2257 apache2 Disabled
0x88001d253140 2262 2287 2257 apache2 Disabled
0x88001d257880 2262 2288 2257 apache2 Disabled
0x88001d257180 2262 2289 2257 apache2 Disabled
0x88001d25a8c0 2262 2290 2257 apache2 Disabled
0x88001d25a1c0 2262 2291 2257 apache2 Disabled
0x88001d25d740 2262 2292 2257 apache2 Disabled
If you used lsof for example with no filter for we don't see 2282 either, but we will have 2262. When lsof etc get's it's tasks from pslist it isn't passing the include_threads as True.
2262 apache2 0 /dev/null
2262 apache2 1 /dev/null
2262 apache2 2 /var/log/apache2/error.log
2262 apache2 3 socket:[6225]
2262 apache2 4 socket:[6226]
2262 apache2 5 pipe:[6237]
2262 apache2 6 pipe:[6237]
2262 apache2 7 /var/log/apache2/other_vhosts_access.log
2262 apache2 8 /var/log/apache2/access.log
2262 apache2 10 anon_inode:[1518]
What's happening when psscan is finding the task struct for 2282 we can then pass it to lsof and the other plugins and they'll happily extract out the information for them - but for all the ones that aren't the leaders it's not really useful. We can get the same information from the leader really (I don't think threads can have different opened files etc? maybe I am wrong on that)
So if we continued with this PR to add in checks to show the difference clearly between threads etc. I still need to find a sample where it's possible to scan for an actual task but mm etc haven't yet been cleared. If that doesn't happen then it's not really useful to add in the --scan option to the other plugins.
@eve-mem very sorry, I got to this issue through another ticket and I haven't noticed that actually had the draft label.
Anyway, I know it's a thread. What I am saying is:
- TIDs (task.pid) are unique, and it's actually "the" kernel task identifier. User PIDs (task.tgid) are not. It's a group identifier.
- The task filter function currently filters by TID (task.pid) which IMO is correct.
Given these two statements, and providing you haven't changed the task filter function, I believe there is a bug somewhere. Because, using "--pid" cannot return more than 1 task/raw.
Given your output, it seems in vol3 already or in your new code, for some reason, it's using the 'task.tgid' as task filter instead of 'task.pid'.
I hope I explained it better, but don't worry. It's better you finish with your changes and we will check this once it's ready. Sorry again 🙏🏻
Hello again @gcmoreira
Firstly re my comment about it being a draft PR.
I'm worried I across passive aggressive without meaning to. I really truly appreciate all of your, @ikelos , and @atcuno help, advice, and guidance. (And of course everyone else who points me in the right direction, but you three have been very generous with your time)
The way I see it if your place of work offered a volatility3 plugin/code review service I'd be looking at spending many 1000s to get the same level of help.
Instead you're all here giving up your own time - for free - to patiently explain things to me, time you could be spending with family or other things you'd rather be doing. So as honestly as I can get across in text form - thank you! I really mean it. Indeed really this extremely long comment...!
I view "ready for review" to mean; @eve-mem thinks this is probably good and correct, but if there are any mistakes or ways of doing this better I'd love to hear them.
"Draft" to mean; @eve-mem knows this PR isn't good enough to waste people's time with, but would welcome any and all comments. Bits that I think probably do need doing, but right now its not hitting the mark.
Then "closed" on my PRs to mean; @eve-mem thought this was a good idea, but after more thought or some obvious issues pointed out by others means it really isn't the right way to be doing things and should just be forgotten completely.
Next re this PR and change as a whole.
I set this to draft as soon as I realised that the "extra" information wasn't actually new - vol was already displaying it.
I had thought I'd found a sample where being able to pass tasks found by scanning could provide extra information that wasn't easily accessible before - hurray something useful! But i was wrong, I'd just found a thread, vol would already show this information - so I set it to draft.
It's why I try to include snippets of output, volshell, etc - so that when I make a mistake it's easy to point out. I also try to use the "linux-sample-N.dmp" files in the examples as they're somewhat easy to get hold of, meaning that it's easier for people to "trust but verify" by running the against the same image. Hopefully making it easier to spot my mistakes etc.
My motivation for this PR is to slowly get all of vol2 capabilities moved across where it makes sense. I see many of the vol2 windiws plugins being able to accept an offset to objects and then provide the information on those (e.g. to save people jumping into volshell, and lower that barrier to entry)
I thought it would make sense for the linux plugins to be able to work off tasks that have been found by scanning, maybe one found that way and not by walking the list could have some useful information. Spurred on by the linux psxview plugin from vol2, i was hoping that in some cases it is possible that something is hiding from the list or exited just recently enough that having this option in vol3 would mean finding those extra bits of information.
However, right now I've not seen it, maybe it'll never happen or so rare it's not worth people's time on this PR and it's better spent on other parts of vol3.
If i do find an example where this option genuinely finds more information I'll probably make it ready again once i can prove to myself it is helping, rather than just finding threads and thinking they are whole new processes as I did before.
Right now all the tasks i see that aren't in the normal list have had mm cleared etc, meaning that even if you pointed vol3 at them there is no extra information to get - it's already been cleared.
If you or anyone else knows it'll never be possible to get extra information this way I'd happily close the PR too. Or if keeping this on the backlog as draft is cluttering things we can close it too.
Then lastly the pid/tid/tgid discussion
I'm worried I've come across saying "volatility3 is wrong, we need to change it right away!!!!" And it's really not what i mean. I've not used the best examples (some being proxies, but actually completely wrong) and not explained myself well enough, I think for the most part every in core is completely correct - it's just a finessing point that I loved to see. I'm going to try and add a comment on the issue I'd raised to really cleanly explain what I mean. I'll just take me a little bit of time to make sure I'm really being clear with what i actually mean.
Thank you for spending lots of your own time responding with examples etc, that all takes a long time to do and I really appreciate it.
Any progress on how volatility should handle pid/tid/tgid? The bug that referenced this was marked stale, and I'm keep it doesn't languish if there's already been work done on it? A lot of our plugins should have had the functionality separated out into class method that can just be handed a process, so hopefully changing how the processes are handed in shouldn't be too tough (the linux pslist just has simple output, but it's share between different ways of getting the process list if I recall correctly). So it sounds like it should be possible to get something going, it was whether we wanted to take a bigger bite out of the whole abstract process object idea that was going to wrap all platform processes in the same (or similar) API? Can't remember where we got to with this though, so figured I'd ask... 5:)
Hello @ikelos - I think that the final consensus was to leave PID meaning TGID in the pslist plugin and then PID as PID in other plugins. I still feel a little weird about it, but i can see where people are coming from.
Re actually adding support for scanned tasks, it is something I'd still like to add. I've yet to find a solid example where it does actually find new and useful data that i can share and use as a reference.
I did have a go at making a overly simple generic processes here to have a single way to get some details about a process. Although it looks like i very marked it ready for review... :facepalm: https://github.com/volatilityfoundation/volatility3/pull/1000