Return TypedDict instances from Metadata.to_dict()
Summary
This is a ~~partial~~ implementation for zarr-developers/zarr-python#1773.
So far, I have done the following:
- Make
Metadataabc generic overMapping[str, object] - Define
TypedDictclasses for the metadata models that have a well-typed dict representation - Set return value of the the model's
to_dict()method to the relevantTypedDict - Set type of metadata models
from_dict()methods to the relevantTypedDict
Closes zarr-developers/zarr-python#1773
Notes
- Currently, the
Metadataabc is generic over a type bound byMapping[str, object]. I haven't been able to getMapping[str, JSON]to work as a type bound - The
TypedDicts for the various Metadata classes were inferred from each respective classes'from_dict(),to_dict(), and initializer methods - The codec metadata classes
from_dict()methods utilize a variant of the overloadedparse_named_configurationfunction. This accepts aJSONdata type, causing it to raise a mypy error when called w/ one of theTypedDictclasses. This is the case even if we coerce the TypedDict back to adictbefore calling it. I assume this method is performing runtime validation, so we would not want to substitute it w/ dev-time validation provided by theCodecDictclasses - Two calls in the
ArrayV2Metadata.from_dict()method raise mypy errors due to the theArrayV2MetadataDictbeing coerced to adictto allow it to be mutated - All tests w/ local stores pass
Questions
- In the codec metadata class
from_dict()methods, would it be appropriate to ignore the mypyarg-typeerrors that are raised byparse_named_configuration()? Alternatively,parse_named_configuration()'s signature can be modified to accept aMapping[str, Any]rather thanJSON. This is not equivalent toJSON, but seems to be a sufficient substitute for each place thatparse_named_configurationis currently called - Similar to the previous question, would it be appropriate to ignore the mypy
arg-typeerrors in the body ofArrayV2Metadata.from_dict()? - There is a confusing mypy error:
From what I can understand, this is saying that# src/zarr/core/group.py ... class GroupMetadataDict(TypedDict): """A dictionary representing a group metadata.""" attributes: Mapping[str, Any] node_type: NotRequired[Literal["group"]] ... @dataclass(frozen=True) class AsyncGroup: ... @classmethod async def open(...): ... # src/zarr/core/group.py:218 group_metadata = {**zgroup, "attributes": zattrs} # Non-required key "node_type" not explicitly found in any ** item [typeddict-item]node_typeis never found when the dictionaryzgroupis unpacked. However,node_typeis defined w/NotRequired. I searched around and was unable to find any documentation on this -- guidance on next steps would be appreciated!
Checklist
TODO:
- [ ] Add unit tests and/or doctests in docstrings
- [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
- [ ] New/modified features documented in docs/tutorial.rst
- [ ] Changes documented in docs/release.rst
- [ ] GitHub Actions have all passed
- [ ] Test coverage is 100% (Codecov passes)
I experimented with making the Metadata base class generic and ran into issues with bounding the generic type with dict[str, JSON].
Making the Metadata class generic was easy enough:
# src/zarr/abc/metadata.py
...
T = TypeVar("T", bound=dict[str, JSON])
@dataclass(frozen=True)
class Metadata(Generic[T]):
def to_dict(self) -> T:
...
However, I ran into issues with the TypedDict classes being outside of the bounds of the generic type. For example:
# src/zarr/core/chunk_key_encodings.py
...
class ChunkKeyEncodingDict(TypedDict):
"""A dictionary representing a chunk key encoding configuration."""
name: str
configuration: dict[Literal["separator"], SeparatorLiteral]
@dataclass(frozen=True)
class ChunkKeyEncoding(Metadata[ChunkKeyEncodingDict]): # Type argument "ChunkKeyEncodingDict" of "Metadata" must be a subtype of "dict[str, JSON]"
name: str
separator: SeparatorLiteral = "."
It seems like we need a way to bound the TypedDict value types at the dictionary level. I've tried quite a few things -- I have yet to come across a solution.
Is it possible that this is not supported? Would appreciate any guidance @jhamman @d-v-b
T = TypeVar("T", bound=dict[str, JSON])
I think the problem is specifying the type bound to be dict[str, JSON]. First, typeddicts are supposed to be immutable, so they are not a subclass of dict. Mapping should be used instead.
Second, for some reason Mapping[str, JSON] won't work, but Mapping[str, object] does work as a type bound. I don't really know the answer to this one, hopefully someone can explain it.
Here's an example that I got working:
from abc import abstractmethod
from dataclasses import dataclass
from typing import Any, Generic, TypeVar, Mapping, TypedDict
T = TypeVar('T', bound=Mapping[str, object])
class Meta(Generic[T]):
@abstractmethod
def to_dict(self) -> T:
...
class ExampleSpec(TypedDict):
a: str
b: str
@dataclass
class Example(Meta[ExampleSpec]):
a: str
b: str
def to_dict(self) -> ExampleSpec:
return {'a': self.a, 'b': self.b}
x = Example(a='10', b='10')
y = x.to_dict()
reveal_type(y)
"""
Revealed type is "TypedDict('tessel.ExampleSpec', {'a': builtins.str, 'b': builtins.str})"
"""
Hello @d-v-b!
I rebased and pushed commits to make the Metadata abstract base class generic + required changes to the downstream inheriting classes. I updated the PR descriptions w/ implementation notes + additional questions.
When you have a moment, a review would be super helpful!