Pydantic-based type checking
Please read the below. I have run all unit tests locally with and without pydantic type checking enabled, so the changes are fully compatible (though exception messages have changed).
Open Issues The main open issues are
- Need to agree on how to enable it and roll it out. It's currently done through an env variable, so that that the existing behavior is the default, and pydantic type checking (and import-dependency) is opt-in. Here's what I suggest
- Merge these changes (with no pydantic runtime dependency)
- Bump version to 0.8 (higher than internal version)
- Warn users that we will soon have a pydantic 2 dependency (in case they are still on pydantic v1)
- After some time (TBD), enable pydantic type checking by default if pydantic is importable.
- After some time (TBD), add the pydantic 2 runtime dependency, and make it the default
- After some time (TBD), remove the option to do type checking the old way and delete the old code
- Graph instantiation passes the original (unvalidated) arguments to the underlying function call, rather than the validated ones. While this doesn't cause any existing tests to fail, I want to change this to more fully take advantage of the validation that Pydantic can provide.
Motivation Pydantic is the most widely used data validation library for Python. I wanted to leverage it to do the type checking that csp had custom implementations for, in order to
- Reduce the amount of custom code while improving extensibility and modularization
- Improve performance
- Allow for other ways of building graphs (i.e. pydantic models that contain edges, use of pydantic's
validate_calldecorator for validation, etc) - Fix existing issues (such as #181)
In the end, I've probably added as much code as could be removed, but in my opinion it's more compartmentalized and easier to extend. Furthermore, performance is slightly-to-significantly better (baskets in particular are notably improved) and other ways of building graphs with type checking are supported as intended.
Challenges
The main challenge is csp's handling of "template variables", which is pretty unique, i.e. to have the type var/forward ref resolved based on the input arguments at runtime. This is handled by introducing a custom validation context (TVarValidationContext) and leveraging some of the existing code to resolve conflicts.
Examples Need to run this before running any of the examples:
import os
os.environ["CSP_PYDANTIC"] = "1"
import csp
from csp import ts
from typing import Dict, Union
Graphs (but not nodes) can now take baskets of baskets as inputs (which was not possible before).
@csp.graph
def foo(x: Dict[str, Dict[str, ts[int]]]) -> ts[bool]:
return csp.const(True)
foo({"x": {"Y": csp.const(0)}})
Graphs can take custom pydantic models that include edge types as attributes (useful for grouping together time series of different underlying types)
from pydantic import BaseModel
class MyBundle(BaseModel):
x: ts[str]
y: ts[float]
z: str = ""
@csp.graph
def f(bundle: MyBundle) -> ts[str]:
return csp.sample(bundle.y, bundle.x)
f(MyBundle(x=csp.const("foo"), y=csp.const(1.)))
Graphs can now also take Union of ts types as input (not yet as outputs)
@csp.graph
def foo(x: Union[ts[float], ts[str]]):
pass
foo(csp.const("x"))
Additional types can now be validated as static arguments, i.e. Callable (though pydantic only performs a simple check that the argument is callable, no validation of arguments, their types or the return type is performed) :
@csp.graph
def foo(f: Callable[[float], float], x: ts[float]) -> ts[float]:
return csp.apply(x, f, float)
foo(lambda x: x, csp.const(1.))
See also https://docs.pydantic.dev/latest/api/standard_library_types/
The pydantic validation decorator can be applied to csp types if only type validation is required:
@validate_call(validate_return=True)
def foo(a: str, b: ts[float], c: Dict[str, ts[int]]) -> csp.Outputs(x=ts[float], y=Dict[str, ts[int]]):
return {"x": b, "y": c}
foo("x", csp.const(0.), {"A": csp.const(1)})
Future work
- Make the USE_PYDANTIC flag the default if pydantic>2 is found in the environment
- Do not allow ts to be None by default - enforce use of Optional
- Remove support for type hints that are not standard python (i.e. [int] instead of list[int] or List[int])
- Force dynamic baskets to be declared through DynamicBasket[K,V] type rather than Dict[ts[K], ts[V]] type
- Better support for return of Union outputs, especially for csp.stats.
- Make csp structs more pydantic compatible (by adding validators/serializers within the pydantic framework, without changing the internal representation).
Note that items 2)-4) would be breaking changes.
Implementation Details The implementation consists of the following pieces:
- Existing types, i.e.
TsType,Outputs,OutputBasket, etc were extended with__get_pydantic_core_schema__implementations so that pydantic validation can apply to them. This is enough to enable the use of pydantic models and the pydantic validator with ts types. The complex validation logic forTsTypeis delegated toTsTypeValidator(which is a combination of glorified "is subtype" logic and handling of TVars - see below) - To support the csp TVar logic, new Pydantic types are introduced, which will have special handling:
CspTypeVarandCspTypeVarTypeas the TVars are neither ForwardRefs or TypeVar. - To support existing csp type checking behavior, dynamic baskets and the TVar resolution, an
adjust_annotationsfunction is implemented to adjust the "standard" csp type annotations into fully compliant pydantic annotations - The signature of each node is extended to dynamically create a pydantic model for the inputs and outputs based on the adjusted annotations.
- If the
CSP_PYDANTICenv variable is set, input checking in signature.py and output checking graph.py uses the input/output model for validation, instead of the existing logic - To fully handle the TVar logic, a custom validation context (i.e.
TVarValidationContext) is introduced, which tracks the different resolutions and resolves conflicts, using logic that is nearly identical to the existing implementation (but more generic where possible). This context can maintain state between the validation calls to the different arguments and (sub-arguments for nested structured). In particular, the validation ofCspTypeVarandCspTypeVarTypeinteracts specifically with this validation context. This context is instantiated and passed to the model validation step above. The final step in model validation is to do the resolution of all the detected TVars and to revalidate any fields that have changed type as a result of this (i.e. int->float).
Un-scientific profiling
import os
os.environ["CSP_PYDANTIC"] = "1"
from typing import Dict, List
import csp
from csp import ts
@csp.graph
def bar(x: Dict[str, List[float]]) -> ts[bool]:
return csp.const(True)
inp_bar = {f"sym_{i}": list(range(100)) for i in range(1000)}
@csp.graph
def baz(x: Dict[str, ts[int]]) -> ts[bool]:
return csp.const(True)
inp_baz = {f"key{i}": csp.const(i) for i in range(1000)}
@csp.graph
def qux(x: Dict[str, ts[List[float]]]) -> ts[bool]:
return csp.const(True)
inp_qux = {f"key{i}": csp.const.using(T=List[float])([]) for i in range(1000)}