iceberg-python icon indicating copy to clipboard operation
iceberg-python copied to clipboard

Ability to pickle the `Catalog`

Open Fokko opened this issue 1 year ago • 7 comments

Feature Request / Improvement

This allows distribution of the Catalog object within Ray.

Fokko avatar Mar 11 '24 15:03 Fokko

Copied from the other thread: https://github.com/apache/iceberg-python/issues/513#issuecomment-2009875953

I was looking at Pydantic docs and it looks like there's a helpful function which supports pickling and unpickling https://docs.pydantic.dev/latest/concepts/serialization/#pickledumpsmodel

kevinjqliu avatar Mar 20 '24 15:03 kevinjqliu

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] avatar Sep 17 '24 00:09 github-actions[bot]

@dev-goyal I see that you've merged ray-project/ray#46889 curious if you think pickling the Catalog or Table is still valuable to the Ray project

kevinjqliu avatar Sep 18 '24 20:09 kevinjqliu

More context: https://github.com/ray-project/ray/pull/42235#discussion_r1520929199

kevinjqliu avatar Sep 18 '24 20:09 kevinjqliu

@dev-goyal I see that you've merged ray-project/ray#46889 curious if you think pickling the Catalog or Table is still valuable to the Ray project

I think it could help, but for me - I did not find recreating the catalog each time to add much overhead. So, it will likely just make the code cleaner is all

dev-goyal avatar Oct 02 '24 19:10 dev-goyal

So, it will likely just make the code cleaner is all

Makes sense, the difference is directly pickling the catalog object so that it can be distributed to ray workers versus the current implementation which pass the catalog_kwargs to reconstruct the catalog at the worker node

kevinjqliu avatar Oct 02 '24 20:10 kevinjqliu