orbax
orbax copied to clipboard
Tensorstore spec configuration
Hi Orbax community,
Under the hood, Orbax uses TensorStore for tensor IO, TensorStore integration is a part of type_handlers.py.
TensorStore comes with KVStore implementations for File, GCS, S3, GRPC, etc.
Unfortunately, TensorStore integration part is quite rigid and does not support any form of extension from client code. Namely, it only supports 'gcs' and 'file' kvstores, their spec is hard-coded and can't be configured.
I guess Orbax should provide a way to configure custom kvstore spec based on directory
. I.e. make typed_handlers.py
flexible and extendable and avoid hard-coded if/else. So that, as a user, I can support alternative "directory -> kvstore spec" mapping when needed and patch other parts of tsspec too.
For instance, there could be a class like:
class TsSpecStrategyBase(ABC):
def supported_for(self, directory: str) -> bool:
...
def get_spec(self, directory: str) -> dict:
...
class FileTsSpecStrategy(TsSpecStrategyBase):
...
class GrpcTsSpecStrategy(TsSpecStrategyBase):
...
class TsSpecStrategyResolver:
def register_strategy(self, strategy: TsSpecStrategyBase):
...
def resolve(self, directory: str) -> TsSpecStrategyBase:
...
Motivation. My colleagues have implemented tsgrpc-compatible storage that we want to use as a checkpoint storage. Unfortunately, we can't use it without custom patches to orbax code. Namely:
- we've added another
if/else
to _get_tensorstore_spec to activategrpc
driver for paths likeyt://*
. - supported patches via env variables to
tspec['metadata']
,tspec['kvstore']['config']
,spec['kvstore'] (ocdbt)
in order to configureexperimental_read_coalescing_interval
and disable compression.
I will be happy to assist and submit a PR
Regards, Simon
Hi Simon,
This seems quite reasonable to me. We don't have a mechanism set up for mirroring code from external to internal, but probably an external contributor could just submit a PR and then we can do a manual merge internally.
My thought is that we could have something like:
class TensorStoreSpecStrategy(Protocol):
def read_spec(self, info: ParamInfo, args: RestoreArgs) -> ts.Spec:
...
def write_spec(self, value: Any, info: ParamInfo, args: SaveArgs) -> ts.Spec:
...
class ArrayHandler(TypeHandler):
def __init__(self, ..., tensorstore_strategy: TensorStoreStrategy):
self._tensorstore_strategy = tensorstore_strategy
async def serialize(self, ...):
for value in values:
spec = self._tensorstore_strategy.write_spec()
commit_future = await serialization.async_serialize(value, spec)
return commit_futures
async def deserialize(self, ...):
...
This would be very customizable, though you might have to write a bit more code that you otherwise would. However I think TensorStoreStrategy
could also factor out methods like metadata
to set the metadata field in the ts.Spec, and a common_spec
method that can be delegated to by read_spec
and write_spec
.
Hi Colin,
Great, I will be happy to contribute.
Thanks, I like the provided code sample, it looks very configurable.
Let me clarify one thing.
I assumed that there should be strategies like TensorStoreFileStrategy
, TensorStoreGCSStrategy
, MyTensorStoreStrategy
... And then there is TensorStoreStrategyResolver
that should return a strategy instance based on the given directory
(paramInfo
). And this TensorStoreStrategyResolver
should be passed to ArrayHandler
constructor.
While in your code sample ArrayHandler
accepts tensorstore_strategy
itself, not some resolver
. From your point of view, should there be a resolver
instead? Or, maybe, there can be a special type of strategy like TensorStoreRouterStrategy
that would serve as a resolver
?
Hi @minotru, we will finalize the design after an internal meeting and come up a design by end of week. Then you can help implement! Thanks in advance!
@minotru I would like to give you a quick update. We have made decision to go with @cpgaffney1 's design. However, we will implement the Resolver
because TensorStoreSpecStrategy
can already achieve the customization. The custom TensorStoreSpecStrategy can also stay with client codes, so no integration with Orbax is needed.
Hi @ChromeHearts
Could you please clarify? Seems like "not" is missing in the sentence:
... will (not?) implement the Resolver because TensorStoreSpecStrategy can already achieve ...
Overall, there are some blank spaces in the design that I am not sure about.
I guess an extended version of @cpgaffney1's code sample would be very helpful for me to understand the design better.
Could you please illustrate your vision for:
- built-in TensorStoreSpecStrategy initilization, how it is registered with standard type handler instances
- how built-in strategy should look like to support both FS and GCS and stay neat
- how extension mechanism should look like from user code, i.e. support S3 but keep FS and GCS
Hi @ChromeHearts , @cpgaffney1, could you please circle back here?
Sorry for the late reply!
Yes, Orbax team will not implement the Resolver.
To answer your questions
- The current TensorStoreSpec logic will be factored into a TensorStoreSpecStrategy which will be used by type handlers by default (eg. no custom strategy is supplied)
- Simple file system and GCS are already supported by the default strategy. If you need support for custom GRPC, you can use a type_handler with custom strategy. If you need all to be supported at the same strategy, you can create a custom strategy that based on the default one with additional GRPC supports.
- The default strategy already support FS, GCS & S3 by depending on the prefixes such as /somepath, gs:// & s3://.
Let me know if you have any other question! Thanks!