zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Return TypedDict instances from Metadata.to_dict()

Open ahmadjiha opened this issue 1 year ago • 3 comments

Summary

This is a ~~partial~~ implementation for zarr-developers/zarr-python#1773.

So far, I have done the following:

  1. Make Metadata abc generic over Mapping[str, object]
  2. Define TypedDict classes for the metadata models that have a well-typed dict representation
  3. Set return value of the the model's to_dict() method to the relevant TypedDict
  4. Set type of metadata models from_dict() methods to the relevant TypedDict

Closes zarr-developers/zarr-python#1773

Notes

  1. Currently, the Metadata abc is generic over a type bound by Mapping[str, object]. I haven't been able to get Mapping[str, JSON] to work as a type bound
  2. The TypedDicts for the various Metadata classes were inferred from each respective classes' from_dict(), to_dict(), and initializer methods
  3. The codec metadata classes from_dict() methods utilize a variant of the overloaded parse_named_configuration function. This accepts a JSON data type, causing it to raise a mypy error when called w/ one of the TypedDict classes. This is the case even if we coerce the TypedDict back to a dict before calling it. I assume this method is performing runtime validation, so we would not want to substitute it w/ dev-time validation provided by the CodecDict classes
  4. Two calls in the ArrayV2Metadata.from_dict() method raise mypy errors due to the the ArrayV2MetadataDict being coerced to a dict to allow it to be mutated
  5. All tests w/ local stores pass

Questions

  1. In the codec metadata class from_dict() methods, would it be appropriate to ignore the mypy arg-type errors that are raised by parse_named_configuration()? Alternatively, parse_named_configuration()'s signature can be modified to accept a Mapping[str, Any] rather than JSON. This is not equivalent to JSON, but seems to be a sufficient substitute for each place that parse_named_configuration is currently called
  2. Similar to the previous question, would it be appropriate to ignore the mypy arg-type errors in the body of ArrayV2Metadata.from_dict()?
  3. There is a confusing mypy error:
    # src/zarr/core/group.py
    
    ...
    
    class GroupMetadataDict(TypedDict):
        """A dictionary representing a group metadata."""
    
        attributes: Mapping[str, Any]
        node_type: NotRequired[Literal["group"]]
    
    ...
    
    @dataclass(frozen=True)
    class AsyncGroup:
        ...
        @classmethod
        async def open(...):
            ...
    
            # src/zarr/core/group.py:218
            group_metadata = {**zgroup, "attributes": zattrs}  # Non-required key "node_type" not explicitly found in any ** item  [typeddict-item]
    
    From what I can understand, this is saying that node_type is never found when the dictionary zgroup is unpacked. However, node_type is defined w/ NotRequired. I searched around and was unable to find any documentation on this -- guidance on next steps would be appreciated!

Checklist

TODO:

  • [ ] Add unit tests and/or doctests in docstrings
  • [ ] Add docstrings and API docs for any new/modified user-facing classes and functions
  • [ ] New/modified features documented in docs/tutorial.rst
  • [ ] Changes documented in docs/release.rst
  • [ ] GitHub Actions have all passed
  • [ ] Test coverage is 100% (Codecov passes)

ahmadjiha avatar Aug 19 '24 02:08 ahmadjiha

I experimented with making the Metadata base class generic and ran into issues with bounding the generic type with dict[str, JSON].

Making the Metadata class generic was easy enough:

# src/zarr/abc/metadata.py

...

T = TypeVar("T", bound=dict[str, JSON])

@dataclass(frozen=True)
class Metadata(Generic[T]):
    def to_dict(self) -> T:
        ...

However, I ran into issues with the TypedDict classes being outside of the bounds of the generic type. For example:

# src/zarr/core/chunk_key_encodings.py

...

class ChunkKeyEncodingDict(TypedDict):
    """A dictionary representing a chunk key encoding configuration."""

    name: str
    configuration: dict[Literal["separator"], SeparatorLiteral]


@dataclass(frozen=True)
class ChunkKeyEncoding(Metadata[ChunkKeyEncodingDict]):  # Type argument "ChunkKeyEncodingDict" of "Metadata" must be a subtype of "dict[str, JSON]"
    name: str
    separator: SeparatorLiteral = "."

It seems like we need a way to bound the TypedDict value types at the dictionary level. I've tried quite a few things -- I have yet to come across a solution.

Is it possible that this is not supported? Would appreciate any guidance @jhamman @d-v-b

ahmadjiha avatar Sep 02 '24 00:09 ahmadjiha

T = TypeVar("T", bound=dict[str, JSON])

I think the problem is specifying the type bound to be dict[str, JSON]. First, typeddicts are supposed to be immutable, so they are not a subclass of dict. Mapping should be used instead.

Second, for some reason Mapping[str, JSON] won't work, but Mapping[str, object] does work as a type bound. I don't really know the answer to this one, hopefully someone can explain it.

Here's an example that I got working:

from abc import abstractmethod
from dataclasses import dataclass
from typing import Any, Generic, TypeVar, Mapping, TypedDict

T = TypeVar('T', bound=Mapping[str, object])

class Meta(Generic[T]):

    @abstractmethod
    def to_dict(self) -> T:
        ...

class ExampleSpec(TypedDict):
    a: str
    b: str

@dataclass
class Example(Meta[ExampleSpec]):
    a: str
    b: str

    def to_dict(self) -> ExampleSpec:
        return {'a': self.a, 'b': self.b}
    
x = Example(a='10', b='10')
y = x.to_dict()
reveal_type(y)
"""
Revealed type is "TypedDict('tessel.ExampleSpec', {'a': builtins.str, 'b': builtins.str})"
"""

d-v-b avatar Sep 02 '24 07:09 d-v-b

Hello @d-v-b!

I rebased and pushed commits to make the Metadata abstract base class generic + required changes to the downstream inheriting classes. I updated the PR descriptions w/ implementation notes + additional questions.

When you have a moment, a review would be super helpful!

ahmadjiha avatar Oct 11 '24 04:10 ahmadjiha