Feature: GitHub action enforcing style requirements
@atcuno and I have been discussing ways that we can enforce certain style requirements for new plugins, especially in light of some cleanup that has been done recently regarding confusing variable names and the use of keyword arguments for certain methods. I believe that this can be done accurately using tree-sitter:
- https://github.com/tree-sitter/tree-sitter-python
- https://github.com/tree-sitter/py-tree-sitter
This issue is to keep track of the things that we want to enforce. So far we have the following:
- Require calling the following methods with all-keyword arguments:
pslist.PsList.list_processeshivelist.HiveList.list_hivessymbols.symbol_table_is_64bitssdt.SSDT.build_module_collection
- Disallow importing classes by name into other modules (e.g.
from volatility3.plugins.windows.pslist import PsList) - Disallow method arguments that don't make sense with their type, like
symbol_table: str
I think this can save a lot of time and round-trips on PR review since these things frequently come up, and there are existing places within the codebase that go against this proposed style that we can fix up at the same time.
I'd say anything with more than three arguments should be called with keywords args (regardless of the function), which might be easier to locate?
I've heard of tree-sitter before but never used it myself. I'll be happy to add it as long as it's pretty easy to pick up (and we're confident we've got more than one person that can look after it).
I think the variable name/type matching will be tricky, but I'm happy to see what we can do in that regard. All of these should also be documented in the coding guide which still needs review (see #1579 ).
RE: importing classes by name, currently we are at:
volatility3/framework/symbols/linux/__init__.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/keyboard_notifiers.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tty_check.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tracing/tracepoints.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/tracing/ftrace.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/netfilter.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/kthreads.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/linux/check_idt.py:import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.pslist as pslist
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.threads as threads
volatility3/framework/plugins/windows/debugregisters.py:import volatility3.plugins.windows.pe_symbols as pe_symbols
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.pslist as pslist
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.threads as threads
volatility3/framework/plugins/windows/suspended_threads.py:import volatility3.plugins.windows.pe_symbols as pe_symbols
volatility3/framework/plugins/windows/amcache.py:from volatility3.framework.symbols.windows.extensions import registry as reg_extensions
volatility3/framework/plugins/windows/svcscan.py:from volatility3.framework.symbols.windows.extensions import services as services_types
volatility3/framework/plugins/windows/scheduled_tasks.py:from volatility3.framework.symbols.windows.extensions import registry as reg_extensions
volatility3/framework/layers/scanners/__init__.py:from volatility3.framework.layers.scanners import multiregexp as multiregexp
volatility3/framework/layers/resources.py: from smb import SMBHandler as SMBHandler # lgtm [py/unused-import]
volatility3/framework/configuration/__init__.py:from volatility3.framework.configuration import requirements as requirements
@ikelos are these okay? They are very new so I assume it is, but we will need to document it as an exception and have the github action skip them:
import volatility3.framework.symbols.linux.utilities.modules as linux_utilities_modules
Well, just modules may conflict with other imports or variables, in which case the renaming is fine as long as it makes sense.
It is still a module being imported, rather than a class, so that should be fine?
There will likely be a little learning curve around tree-sitter for whoever else helps maintain that, especially writing queries, but I can add really thorough comments explaining those, and I'm also happy to write up a doc with a good workflow around creating/troubleshooting queries.
Here's a basic example of how this might work: let's say that we want to find all instances of calls to pslist.PsList.list_processes that have non-keyword arguments, and tag them in the PR.
If the call adheres to our style-guidelines, tree-sitter will produce a syntax tree for the method call node that looks like this:
(call ; [931, 16] - [936, 17]
function: (attribute ; [931, 16] - [931, 44]
object: (attribute ; [931, 16] - [931, 29]
object: (identifier) ; [931, 16] - [931, 22]
attribute: (identifier)) ; [931, 23] - [931, 29]
attribute: (identifier)) ; [931, 30] - [931, 44]
arguments: (argument_list ; [931, 44] - [936, 17]
(keyword_argument ; [932, 20] - [932, 40]
name: (identifier) ; [932, 20] - [932, 27]
value: (attribute ; [932, 28] - [932, 40]
object: (identifier) ; [932, 28] - [932, 32]
attribute: (identifier))) ; [932, 33] - [932, 40]
(keyword_argument ; [933, 20] - [933, 48]
name: (identifier) ; [933, 20] - [933, 30]
value: (attribute ; [933, 31] - [933, 48]
object: (identifier) ; [933, 31] - [933, 37]
attribute: (identifier))) ; [933, 38] - [933, 48]
(keyword_argument ; [934, 20] - [934, 57]
name: (identifier) ; [934, 20] - [934, 32]
value: (attribute ; [934, 33] - [934, 57]
object: (identifier) ; [934, 33] - [934, 39]
attribute: (identifier))) ; [934, 40] - [934, 57]
(keyword_argument ; [935, 20] - [935, 57]
name: (identifier) ; [935, 20] - [935, 31]
value: (attribute ; [935, 32] - [935, 57]
object: (identifier) ; [935, 32] - [935, 36]
attribute: (identifier)))))
Each node and child node is kept within a set of parentheses, and the semicolons followed by the [row, col] information in brackets are just comments indicating the text range of the node within the document. Tree-sitter gives us access to those values when we interact with nodes through the API, and this is what allows us to auto-tag those in the PR.
If we want to capture all nodes that are list_processes method calls, we can write this basic query:
(call
function: (attribute) @method ( #eq? @method "pslist.PsList.list_processes" )
arguments: (argument_list) @args
) @call_to_check
Then, in Python via the tree-sitter API, we would check each of the direct child nodes of the arguments field (an argument_list node) to ensure that they are all keyword_argument nodes:
import textwrap
import unittest
import tree_sitter
from tree_sitter_python import language as python_language
PYTHON_LANGUAGE = tree_sitter.Language(python_language())
PSLIST_QUERY = PYTHON_LANGUAGE.query(
"""
(call
function: (attribute) @method ( #eq? @method "pslist.PsList.list_processes" ) ; We only want to check these calls
arguments: (argument_list) @args ; Capture the arguments node for easy inspection later
) @call_to_check
"""
)
def calls_are_valid(document_text: str) -> bool:
parser = tree_sitter.Parser(PYTHON_LANGUAGE)
parsed_tree = parser.parse(document_text.encode("utf-8"))
captures = PSLIST_QUERY.captures(parsed_tree.root_node)
for capture_name, nodes in captures.items():
if capture_name == "args":
for child in nodes[0].children:
if not child.is_named:
continue
if not child.type == "keyword_argument":
print(f"Argument at row {child.start_point.row}, column {child.start_point.column} is non-keyword")
return False
return True
class TestValidation(unittest.TestCase):
def test_valid_call(self):
CALL = textwrap.dedent(
"""
pslist.PsList.list_processes(
context=self.context,
layer_name=kernel.layer_name,
symbol_table=kernel.symbol_table_name,
filter_func=self._conhost_proc_filter,
)
"""
)
assert calls_are_valid(CALL)
def test_invalid_call(self):
CALL = textwrap.dedent(
"""
pslist.PsList.list_processes(
self.context,
layer_name=kernel.layer_name,
symbol_table=kernel.symbol_table_name,
filter_func=self._conhost_proc_filter,
)
"""
)
assert not calls_are_valid(CALL)
if __name__ == "__main__":
unittest.main()
Are there more tests to add related to this ticket @dgmcdona ?
Did we have a PR that merged this already? If so, we can tag it under the development section please.
This one mentioned this ticket: https://github.com/volatilityfoundation/volatility3/pull/1743
The title is kind of broad as we wouldn't enforce all the requirements at once (or maybe anytime soon). So I guess its either leave this one open indefinitely or close it and then just make new tickets/PRs as more actions are made.
I am not sure how to do this:
"tag it under the development section please."
Can you explain and I can link them in future?
There's a panel on the side (just below Relationships and Milestone) called development, where PRs that would close this ticket off can be put. We should link all the bugs that deal with this under there to save people going through pages of data to see if anything's being done about it. So if #1743 closes this or helps towards closing it, it should live there. If it did close it, we should mark this as closed too.
The only gotcha is that if open PRs get merged, they risk closing off the ticket even if they're only partial fixes, so it needs a little attention when being done, but for ones we think are already complete, that's the place to keep track of what closed it.
I feel like this has been resolved already? Can we mark it as closed or is there still something outstanding?