codeql
codeql copied to clipboard
Python: Add type-tracking flow for class (instance) attributes
class MyClass:
def set_foo(self):
self.foo = <value>
def uses(self):
print(self.foo)
This PR adds flow from <value> to the use in print for type-tracking. It also handles instances, and class-level attributes. (reviewing commit-by-commit is recommended)
Implementation questions
The implementation so far uses loadStoreStep to add flow to the self/cls parameters of normal/classmethods on a class. As highlighted in the comment in the code, compared with a "simple" jump-step or levelStepNoCall, this allows any potential flow-summaries to still work.
(note: Ruby currently uses levelStepNoCall, which is where the inspiration for doing so came from).
However, there's a few things I'm not 100% sure of, so let me call those out:
-
whether there's an implicit assumption that
loadStoreStepis not allowed to cross function borders (if so, we should add that as a consistency check). If that's the case, we can implement this withjumpStep, but will sacrifice some functionality since we will need to target attribute-reads directly. -
performance! Since this adds a step from all
nwrites to everymreads of an attribute, worst case is O(n * m) = O(n²) :disappointed: We should be able to overcome this by adding an intermediary node that represents class-instance thatself.foo = <value>can target, resulting in O(n + m) steps :+1: I haven't done anything about this yet, since I wanted to get the basic functionality in first. -
This PR does simple assumption that any
self.foo = <value>will end up being available after the function has finished... but we could see code likeself.foo = <value1>; self.foo = <value2>, where only<value2>would be available afterwards. I don't expect this will become a huge issue, so I propose we wait until we see this being a problem in real code. -
Subclass flow is currently not handled. I think we need proper performance solved first, and also wanted to ensure we could get agreement on basic functionality become making implementation more complex.
-
Instance flow. This felt like the right thing to do, both so we would be able to handle examples like the one below, but also so instance and
selfreference within a method would behave in the same way.custom_sql_conn.connect() # sets the 'cursor' attribute in this function custom_sql_conn.cursor.execute(...)
haven't written a change-note yet, I'll do that later :+1:
- there's an implicit assumption that
loadStoreStepis not allowed to cross function borders
loadStoreSteps are assumed to be local (as are loadSteps and storeSteps).