Used TypeDict for CachingVisitor.state
This changes CachingVisitor.state to used a TypedDict. Previously, we used a Mapping[str, Any], which had two problems:
- Risks typos in the key names causing unexpected KeyErrors
- The
Anytype for the values ofstatemean that, at least by default, you won't get any type checking on things using values looked up fromstate.
Unit tests should catch all these, but this can provide some earlier feedback on any issues.
The total=False keyword is used in the TypedDict to indicate that these keys are all optional. mypy doesn't error if you do a lookup on an optional field of a TypedDict, so we do still need to handle missing keys like we have been.
This does limit CachingVisitor a bit, but since we're the only users of that I think that's a good thing.
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.
Contributors can view more details about this message here.
FYI @rjzamora this will have some overlap with https://github.com/rapidsai/cudf/pull/19130. Any new keys we add to state will need a field in CachingVisitorState.
Hmm, failing on 3.10: https://github.com/rapidsai/cudf/actions/runs/15594634303/job/43923137155?pr=19135#step:10:7551
│ │ TypeError: cannot inherit from both a TypedDict type and a non-TypedDict base class
This might need to wait until we drop Python 3.10.
Just seeing if CI passes with TypedDict from typing_extensions. The docs say this should work. If it does, we can discuss adding that as a dependency for python=3.10. Maybe not worth it.
I'd like to add a dependency on typing-extensions for just python 3.10. For 3.11 and newer, we don't need typing-extensions. Here's what I think is the desired output:
diff --git a/conda/recipes/cudf-polars/recipe.yaml b/conda/recipes/cudf-polars/recipe.yaml
index 85740e5930..8133c7e8a5 100644
--- a/conda/recipes/cudf-polars/recipe.yaml
+++ b/conda/recipes/cudf-polars/recipe.yaml
@@ -50,6 +50,7 @@ requirements:
- python
- pylibcudf =${{ version }}
- polars >=1.24,<1.31
+ - typing-extensions # py==310
- ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }}
ignore_run_exports:
by_name:
diff --git a/python/cudf_polars/pyproject.toml b/python/cudf_polars/pyproject.toml
index b3d28e027d..9589a8a3f9 100644
--- a/python/cudf_polars/pyproject.toml
+++ b/python/cudf_polars/pyproject.toml
@@ -21,6 +21,7 @@ requires-python = ">=3.10"
dependencies = [
"polars>=1.25,<1.31",
"pylibcudf==25.8.*,>=0.0.0a0",
+ "typing-extensions; python_version < '3.11'",
] # This list was generated by `rapids-dependency-file-generator`. To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
classifiers = [
"Intended Audience :: Developers",
I'd hoped that https://github.com/rapidsai/cudf/pull/19135/commits/4ed51687f414c3023040f325244e6b719ea8544b would do the trick, but that gives the same output as what's on 25.08 (no change).
Hey @TomAugspurger -- I'm not sure if the dependency file generator knows how to handle version-specific dependencies in a pyproject.toml.
I think it's reasonable to do something like:
@@ -834,13 +834,11 @@ dependencies:
packages:
- polars>=1.25,<1.31
specific:
- - output_types: [conda, requirements, pyproject]
+ - output_types: pyproject
matrices:
- - matrix: {py: "3.10"}
- packages:
- - typing-extensions
- matrix: null
- packages: null
+ packages:
+ - "typing-extensions; python_version < '3.11'"
run_cudf_polars_experimental:
common:
- output_types: [conda, requirements, pyproject]
The other spot where typing_extensions would need to show up is in the conda environment files, but it seems to already be there.
As for the rattler-build recipes, those aren't covered (yet) by dependencies.yaml, so you can go ahead and make the necessary changes directly.
Oh, I hadn't considered embedding the marker I want (python_version < '3.11) inside the depedencies.yaml. That seems to work nicely, thanks.
@gforsyth when you get a chance can you verify that https://github.com/rapidsai/cudf/pull/19135/commits/2077154e5e2827c7db624022597b6be30f7a6888 looks right for the rattler-build side of things? Based on https://rattler.build/latest/selectors/#available-variables it looks OK, but I haven't used rattler-build before, and wasn't sure how to validate it locally.
@rjzamora do you have any concerns with this? This adds a bit more friction to putting something in state (you have to declare it in CachingVisitorState), but I think the benefit is worth it.
@rjzamora do you have any concerns with this? This adds a bit more friction to putting something in state (you have to declare it in CachingVisitorState), but I think the benefit is worth it.
Sorry I missed this earlier. This PR probably moves things in the right direction, so I think the changes are fine.
With that said, I do think it's pretty confusing to use the same CachingVisitorState type for different kinds of visitors that may need to track very different kinds of state. I think the approach taken here is to just include "everything" in a single CachingVisitorState class. I suppose it would be tricky to have separate DecomposeExprState(CachingVisitorState) and LowerIRState(CachingVisitorState) classes, for example?
I felt that awkwardness too. It's identical to what we have today, it's just that declaring the types makes it obvious that we're putting all kinds of things in there. I'll see what I can do.
I pushed some changes that seem to do what we want. It's a bit complicated... but I think worth it. Now we have a LowerIRTransformer type:
LowerIRTransformer: TypeAlias = GenericTransformer[
"ir.IR", "tuple[ir.IR, MutableMapping[ir.IR, PartitionInfo]]", LowerIRState
]
and we know that inside lower_ir_node we always have a config_options available in the state, and nothing else.
If folks are OK with this overall I'll clean up a couple things (docs, where we define these Transformer subtypes) and ping for another round of reviews.
If folks are OK with this overall I'll clean up a couple things (docs, where we define these Transformer subtypes) and ping for another round of reviews.
I'll defer to you on whether the extra effort is worth it, and I'll be happy to review.
/merge