codeql icon indicating copy to clipboard operation
codeql copied to clipboard

Python: Add type-tracking flow for class (instance) attributes

Open RasmusWL opened this issue 1 year ago • 2 comments
trafficstars

class MyClass:
    def set_foo(self):
        self.foo = <value>

    def uses(self):
        print(self.foo)

This PR adds flow from <value> to the use in print for type-tracking. It also handles instances, and class-level attributes. (reviewing commit-by-commit is recommended)

Implementation questions

The implementation so far uses loadStoreStep to add flow to the self/cls parameters of normal/classmethods on a class. As highlighted in the comment in the code, compared with a "simple" jump-step or levelStepNoCall, this allows any potential flow-summaries to still work.

(note: Ruby currently uses levelStepNoCall, which is where the inspiration for doing so came from).

However, there's a few things I'm not 100% sure of, so let me call those out:

  1. whether there's an implicit assumption that loadStoreStep is not allowed to cross function borders (if so, we should add that as a consistency check). If that's the case, we can implement this with jumpStep, but will sacrifice some functionality since we will need to target attribute-reads directly.

  2. performance! Since this adds a step from all n writes to every m reads of an attribute, worst case is O(n * m) = O(n²) :disappointed: We should be able to overcome this by adding an intermediary node that represents class-instance that self.foo = <value> can target, resulting in O(n + m) steps :+1: I haven't done anything about this yet, since I wanted to get the basic functionality in first.

  3. This PR does simple assumption that any self.foo = <value> will end up being available after the function has finished... but we could see code like self.foo = <value1>; self.foo = <value2>, where only <value2> would be available afterwards. I don't expect this will become a huge issue, so I propose we wait until we see this being a problem in real code.

  4. Subclass flow is currently not handled. I think we need proper performance solved first, and also wanted to ensure we could get agreement on basic functionality become making implementation more complex.

  5. Instance flow. This felt like the right thing to do, both so we would be able to handle examples like the one below, but also so instance and self reference within a method would behave in the same way.

    custom_sql_conn.connect() # sets the 'cursor' attribute in this function
    custom_sql_conn.cursor.execute(...)
    

RasmusWL avatar Jun 04 '24 12:06 RasmusWL

haven't written a change-note yet, I'll do that later :+1:

RasmusWL avatar Jun 10 '24 08:06 RasmusWL

  1. there's an implicit assumption that loadStoreStep is not allowed to cross function borders

loadStoreSteps are assumed to be local (as are loadSteps and storeSteps).

hvitved avatar Jun 10 '24 09:06 hvitved