Custom TypeHandler and "No per-process OCDBT checkpoint subdirs" warning
Hello,
I've recently created a custom type handler. Using it, and running on a single process I see the following warning
WARNING:absl:[process=0][thread=async_save_18] Skipping merge of OCDBT checkpoints: No per-process OCDBT checkpoint subdirs found in /tmp/ckp3/115.orbax-checkpoint-tmp-136/callbacks.orbax-checkpoint-tmp-139,
The custom type handler I wrote serialises some custom type containing some numpy arrays, and if I was to run this across multiple processes I'd like only the master process to serialise the data (which is basically replicated).
How can I silence this warning? Did I forgot to define something?
The merge is there to allow ArrayHandler to write data to per-process subdirectories, at which point they can be merged to form a "global view" that is used for restoration. In your custom handler the master process is responsible for serializing everything, so you already have a global view.
You could silence the warning by using your own PyTreeCheckpointHandler that just skips the finalize implementation.
Or your custom TypeHandler could write data to ocdbt.process_X on the master process and the merge would be performed on that single subdirectory, so the merge is basically a no-op since there's only one process.
Thank you @cpgaffney1 .
I think I figured what the problem was...
If I use a PytreeSave which contains only types that are handled by a 'custom Type handler' (that do not create an ocdbt.process_X folder) then this warning gets thrown.
This is because PyTreeSave assumes that at least 1 ocdbt-directory-creation type handler is used to treat the collection, but this is not guaranteed..
I'm seeing this warning without using custom TypeHandler, just using PyTreeSave. Is that expected?
You must have use_ocdbt set to False, is that the case? Ideally we should not be logging a warning in that case since there's nothing to be concerned about.