[feat request] Make `Table` / `TableMetadata` JSON serializable
Feature Request / Improvement
The REST Catalog exposes Table and TableMetadata information as HTTP endpoints in JSON format (link). This information is similar to the internal state of Table and TableMetadata objects in Python.
It would be great to make these JSON serializable.
Example
from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
json.dumps(vars(tbl))
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Table is not JSON serializable
>>> json.dumps(vars(tbl))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
chunks = self.iterencode(o, _one_shot=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
return _iterencode(o, 0)
^^^^^^^^^^^^^^^^^
File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TableMetadataV1 is not JSON serializable
We should be able to (de)serialize it using Pydantic. That's probably also faster.
oh thanks for the hint, looks like using the model_dump_json function works.
from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
tbl.metadata.model_dump_json()
but only on tbl.metadata and not tbl.
There's already a __repr__ function defined for the Table object. @Fokko what do you think about adding another function for Table which will output the JSON representation?