Pydantic-based type checking

Open ptomecek opened this issue 1 year ago • 0 comments

Please read the below. I have run all unit tests locally with and without pydantic type checking enabled, so the changes are fully compatible (though exception messages have changed).

Open Issues The main open issues are

Need to agree on how to enable it and roll it out. It's currently done through an env variable, so that that the existing behavior is the default, and pydantic type checking (and import-dependency) is opt-in. Here's what I suggest

Merge these changes (with no pydantic runtime dependency)
Bump version to 0.8 (higher than internal version)
Warn users that we will soon have a pydantic 2 dependency (in case they are still on pydantic v1)
After some time (TBD), enable pydantic type checking by default if pydantic is importable.
After some time (TBD), add the pydantic 2 runtime dependency, and make it the default
After some time (TBD), remove the option to do type checking the old way and delete the old code

Graph instantiation passes the original (unvalidated) arguments to the underlying function call, rather than the validated ones. While this doesn't cause any existing tests to fail, I want to change this to more fully take advantage of the validation that Pydantic can provide.

Motivation Pydantic is the most widely used data validation library for Python. I wanted to leverage it to do the type checking that csp had custom implementations for, in order to

Reduce the amount of custom code while improving extensibility and modularization
Improve performance
Allow for other ways of building graphs (i.e. pydantic models that contain edges, use of pydantic's validate_call decorator for validation, etc)
Fix existing issues (such as #181)

In the end, I've probably added as much code as could be removed, but in my opinion it's more compartmentalized and easier to extend. Furthermore, performance is slightly-to-significantly better (baskets in particular are notably improved) and other ways of building graphs with type checking are supported as intended.

Challenges The main challenge is csp's handling of "template variables", which is pretty unique, i.e. to have the type var/forward ref resolved based on the input arguments at runtime. This is handled by introducing a custom validation context (TVarValidationContext) and leveraging some of the existing code to resolve conflicts.

Examples Need to run this before running any of the examples:

import os
os.environ["CSP_PYDANTIC"] = "1"
import csp
from csp import ts
from typing import Dict, Union

Graphs (but not nodes) can now take baskets of baskets as inputs (which was not possible before).

@csp.graph
def foo(x: Dict[str, Dict[str, ts[int]]]) -> ts[bool]:
    return csp.const(True)

foo({"x": {"Y": csp.const(0)}})

Graphs can take custom pydantic models that include edge types as attributes (useful for grouping together time series of different underlying types)

from pydantic import BaseModel

class MyBundle(BaseModel):
    x: ts[str]
    y: ts[float]
    z: str = ""

@csp.graph
def f(bundle: MyBundle) -> ts[str]:
    return csp.sample(bundle.y, bundle.x)

f(MyBundle(x=csp.const("foo"), y=csp.const(1.)))

Graphs can now also take Union of ts types as input (not yet as outputs)

@csp.graph
def foo(x: Union[ts[float], ts[str]]):
    pass
foo(csp.const("x"))

Additional types can now be validated as static arguments, i.e. Callable (though pydantic only performs a simple check that the argument is callable, no validation of arguments, their types or the return type is performed) :

@csp.graph
def foo(f: Callable[[float], float], x: ts[float]) -> ts[float]:
    return csp.apply(x, f, float)
foo(lambda x: x, csp.const(1.))

The pydantic validation decorator can be applied to csp types if only type validation is required:

@validate_call(validate_return=True)
def foo(a: str, b: ts[float], c: Dict[str, ts[int]]) -> csp.Outputs(x=ts[float], y=Dict[str, ts[int]]):
    return {"x": b, "y": c}

foo("x", csp.const(0.), {"A": csp.const(1)})

Future work

Make the USE_PYDANTIC flag the default if pydantic>2 is found in the environment
Do not allow ts to be None by default - enforce use of Optional
Remove support for type hints that are not standard python (i.e. [int] instead of list[int] or List[int])
Force dynamic baskets to be declared through DynamicBasket[K,V] type rather than Dict[ts[K], ts[V]] type
Better support for return of Union outputs, especially for csp.stats.
Make csp structs more pydantic compatible (by adding validators/serializers within the pydantic framework, without changing the internal representation).

Note that items 2)-4) would be breaking changes.

Implementation Details The implementation consists of the following pieces:

Existing types, i.e. TsType, Outputs, OutputBasket, etc were extended with __get_pydantic_core_schema__ implementations so that pydantic validation can apply to them. This is enough to enable the use of pydantic models and the pydantic validator with ts types. The complex validation logic for TsType is delegated to TsTypeValidator (which is a combination of glorified "is subtype" logic and handling of TVars - see below)
To support the csp TVar logic, new Pydantic types are introduced, which will have special handling: CspTypeVar and CspTypeVarType as the TVars are neither ForwardRefs or TypeVar.
To support existing csp type checking behavior, dynamic baskets and the TVar resolution, an adjust_annotations function is implemented to adjust the "standard" csp type annotations into fully compliant pydantic annotations
The signature of each node is extended to dynamically create a pydantic model for the inputs and outputs based on the adjusted annotations.
If the CSP_PYDANTIC env variable is set, input checking in signature.py and output checking graph.py uses the input/output model for validation, instead of the existing logic
To fully handle the TVar logic, a custom validation context (i.e. TVarValidationContext) is introduced, which tracks the different resolutions and resolves conflicts, using logic that is nearly identical to the existing implementation (but more generic where possible). This context can maintain state between the validation calls to the different arguments and (sub-arguments for nested structured). In particular, the validation of CspTypeVar and CspTypeVarType interacts specifically with this validation context. This context is instantiated and passed to the model validation step above. The final step in model validation is to do the resolution of all the detected TVars and to revalidate any fields that have changed type as a result of this (i.e. int->float).

Un-scientific profiling

import os
os.environ["CSP_PYDANTIC"] = "1"
from typing import Dict, List
import csp
from csp import ts

@csp.graph
def bar(x: Dict[str, List[float]]) -> ts[bool]:
    return csp.const(True)
inp_bar = {f"sym_{i}": list(range(100)) for i in range(1000)}

@csp.graph
def baz(x: Dict[str, ts[int]]) -> ts[bool]:
    return csp.const(True)
inp_baz = {f"key{i}": csp.const(i) for i in range(1000)}

@csp.graph
def qux(x: Dict[str, ts[List[float]]]) -> ts[bool]:
    return csp.const(True)
inp_qux = {f"key{i}": csp.const.using(T=List[float])([]) for i in range(1000)}

Apr 01 '24 19:04 ptomecek