zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

[v3] Refactor metadata classes

Open normanrz opened this issue 2 years ago • 4 comments

In the current V3 draft version, all metadata classes are closely modeled after the json representation. For example, the regular chunk grid is represented as this json object:

{
  "name": "regular",
  "configuration": { "chunk_shape": [2, 5] }
}

In Python, these objects are implemented as 2 classes

@frozen
class RegularChunkGridConfigurationMetadata:
    chunk_shape: ChunkCoords


@frozen
class RegularChunkGridMetadata:
    configuration: RegularChunkGridConfigurationMetadata
    name: Literal["regular"] = "regular"

ChunkGridMetadata = Union[RegularChunkGridMetadata]

This is an artifact of using cattrs for (de)serialization. However, we were thinking of dropping the attrs dependency. That would open the way for a more ergonomic class design that serializes to the same json structure.

class ChunkGrid(ABC):
    pass

class RegularChunkGrid(ChunkGrid):
    chunk_shape: ChunkCoords

normanrz avatar Jan 03 '24 17:01 normanrz

@normanrz - I like this suggestion. Perhaps you could expand on what the serialization/deserialization flow might look like. In the proposal to basically unpack the JSON object by key in a sequential pattern like this:

zarr_json = json.loads(...)

if zarr_json["node_type"] == "array":
    ...

    if zarr_json["chunk_grid"]["name"] == "regular":
        chunk_grid = RegularChunkGrid(chunk_shape=zarr_json["chunk_grid"]["configuration"]["chunk_shape"])

array_meta = ArrayMetadata(..., chunk_grid=chunk_grid, ...)

jhamman avatar Jan 11 '24 01:01 jhamman

I think it could work via delegation from the parent classes:

class ChunkGrid:
    @classmethod
    def from_dict(cls, val: Dict[str, Any]) -> ChunkGrid:
        assert isinstance(val, dict)
        name = val["name"]
        if name == "regular":
            return RegularChunkGrid.from_dict(val)
        else:
            raise NotImplementedError

class RegularChunkGrid(ChunkGrid):
    @classmethod
    def from_dict(cls, val: Dict[str, Any]) -> RegularChunkGrid:
        assert isinstance(val, dict)
        assert val["name"] == "regular"
        assert _validate_int_list(val["configuration"]["chunk_shape"])
        return cls(chunk_shape=tuple(val["configuration"]["chunk_shape"]))

...

zarr_json = json.loads(...)

if zarr_json["node_type"] == "array":
    ...

    chunk_grid = ChunkGrid.from_dict(zarr_json["chunk_grid"])

array_meta = ArrayMetadata(..., chunk_grid=chunk_grid, ...)

With cattrs we had nice input validation including reasonably understandable error messages. We need to decide if we want to add that now or later.

In the future this could be extended so that ChunkGrid.from_dict uses an extensible registry for different chunk grid variants.

normanrz avatar Jan 11 '24 08:01 normanrz

@d-v-b you refactored this quite a bit. Would you agree to consider this done?

normanrz avatar Apr 22 '24 12:04 normanrz