orbax icon indicating copy to clipboard operation
orbax copied to clipboard

Orbax cannot save numpy array with dtype=np.object_

Open amifalk opened this issue 10 months ago • 1 comments

Are there any plans to support this in orbax? Tensorstore can interpret strings: https://google.github.io/tensorstore/python/api/tensorstore.string.html

I realize I can pull out the object arrays into a json file and then stitch them back together when I load things in, but it's not ergonomic given that they are logically connected in my workflow.

import os
import orbax.checkpoint as ocp

test = {'a': np.array([True, False, np.nan], dtype=np.object_),
        'b': np.array(['x', 'y', 'z'], dtype=np.object_)}

ckptr = ocp.StandardCheckpointer()
ckptr.save(f'{os.getcwd()}/test', test)

ValueError: Error parsing object member "dtype": Unsupported data type: "object" [source locations='tensorstore/internal/json_binding/json_binding.h:383\ntensorstore/internal/json_binding/json_binding.h:524\ntensorstore/internal/json_binding/json_binding.h:861\ntensorstore/internal/json_binding/json_binding.h:825']

amifalk avatar Apr 09 '24 14:04 amifalk

I recognize that it's not the most convenient solution, but you could also implement a TypeHandler to deal with this. Would be a relatively simple override of the existing NumpyHandler.

cpgaffney1 avatar Apr 09 '24 21:04 cpgaffney1