orbax icon indicating copy to clipboard operation
orbax copied to clipboard

Code duplication around tensorstore_spec logic in orbax-checkpoint

Open minotru opened this issue 1 year ago • 2 comments

Hi Orbax team,

I was looking at Orbax code at the latest version==0.7.0 and found that pieces of code with quite heavy logic around tensorstore_spec creation seem to contain duplicates.

I'd like to know if this code duplication intended by design or I am welcome to submit a PR.

Here get_tensorstore_spec is a part of public API, and I can't find any usage of get_tensorstore_spec by orbax-checkpoint itself https://github.com/google/orbax/blob/8b4e90d573082a5c7caa5f99c51db376f62a6995/checkpoint/orbax/checkpoint/serialization.py#L97C5-L124

And here is a very similar piece of code in build_kvstore_tspec in _internal package, and build_kvstore_tspec is used heavily by type_handlers.py https://github.com/google/orbax/blob/8b4e90d573082a5c7caa5f99c51db376f62a6995/checkpoint/orbax/checkpoint/_src/serialization/tensorstore_utils.py#L62-L135

Would you consider get_tensorstore_spec to reuse build_kvstore_tspec under the hood?


Also, there seems to be a bit of obscurity with default ts_context value.

  • In orbax/checkpoint/serialization.py, there is TS_CONTEXT in public serialization.py that is used as a default value of context in async_serialize (actually, orbax does not use async_serialize anywhere and recommends using async_serialize_shards) , async_serialize_shards, async_deserialize and by StringHandler.
  • At the same time, intype_handlers.py, there is get_ts_context() (it references _DEFAULT_OCDBT_TS_CONTEXT), and get_ts_context is used by all other handler implementations.

So, TS_CONTEXT from serialization.py seems to be never used by common checkpoint IO code.

Should we somehow leave only 1 source of truth for default ts_context values?

minotru avatar Oct 13 '24 18:10 minotru

This code is currently in the process of a rework by @dicentra13. get_tensorstore_spec in serialization.py should be either removed or should reuse build_kvstore_tspec. I think it is only used in a couple places in internal code - just due slower progress in refactoring rather than any inherent need.

Again, async_serialize is used in one place internally that I'm working on eliminating, although probably will leave that function in place as a wrapper.

Agreed that ts_context should use one source of truth, that could use fixing.

cpgaffney1 avatar Oct 14 '24 17:10 cpgaffney1

Got it @cpgaffney1 , thank for the reply! Waiting for the refactoring by @dicentra13 :)

Let's close this issue once refactoring is completed and merged

minotru avatar Oct 16 '24 14:10 minotru