iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

[feat request] Make `Table` / `TableMetadata` JSON serializable

Open kevinjqliu opened this issue 1 year ago • 3 comments

Feature Request / Improvement

The REST Catalog exposes Table and TableMetadata information as HTTP endpoints in JSON format (link). This information is similar to the internal state of Table and TableMetadata objects in Python.

It would be great to make these JSON serializable.

Example

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
json.dumps(vars(tbl))

Error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type Table is not JSON serializable
>>> json.dumps(vars(tbl))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 200, in encode
    chunks = self.iterencode(o, _one_shot=True)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 258, in iterencode
    return _iterencode(o, 0)
           ^^^^^^^^^^^^^^^^^
  File "/Users/kevinliu/.pyenv/versions/3.11.0/lib/python3.11/json/encoder.py", line 180, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type TableMetadataV1 is not JSON serializable

kevinjqliu avatar Mar 20 '24 04:03 kevinjqliu

We should be able to (de)serialize it using Pydantic. That's probably also faster.

Fokko avatar Mar 20 '24 15:03 Fokko

oh thanks for the hint, looks like using the model_dump_json function works.

from pyiceberg.catalog import load_catalog
import json
catalog = load_catalog()
tbl = catalog.load_table("default.taxi_dataset")
tbl.metadata.model_dump_json()

but only on tbl.metadata and not tbl.

kevinjqliu avatar Mar 20 '24 15:03 kevinjqliu

There's already a __repr__ function defined for the Table object. @Fokko what do you think about adding another function for Table which will output the JSON representation?

kevinjqliu avatar Mar 20 '24 15:03 kevinjqliu