orbax icon indicating copy to clipboard operation
orbax copied to clipboard

Custom TypeHandler and "No per-process OCDBT checkpoint subdirs" warning

Open PhilipVinc opened this issue 1 year ago • 4 comments

Hello,

I've recently created a custom type handler. Using it, and running on a single process I see the following warning

WARNING:absl:[process=0][thread=async_save_18] Skipping merge of OCDBT checkpoints: No per-process OCDBT checkpoint subdirs found in /tmp/ckp3/115.orbax-checkpoint-tmp-136/callbacks.orbax-checkpoint-tmp-139, 

The custom type handler I wrote serialises some custom type containing some numpy arrays, and if I was to run this across multiple processes I'd like only the master process to serialise the data (which is basically replicated).

How can I silence this warning? Did I forgot to define something?

PhilipVinc avatar Nov 13 '24 10:11 PhilipVinc

The merge is there to allow ArrayHandler to write data to per-process subdirectories, at which point they can be merged to form a "global view" that is used for restoration. In your custom handler the master process is responsible for serializing everything, so you already have a global view.

You could silence the warning by using your own PyTreeCheckpointHandler that just skips the finalize implementation.

Or your custom TypeHandler could write data to ocdbt.process_X on the master process and the merge would be performed on that single subdirectory, so the merge is basically a no-op since there's only one process.

cpgaffney1 avatar Nov 13 '24 18:11 cpgaffney1

Thank you @cpgaffney1 .

I think I figured what the problem was...

If I use a PytreeSave which contains only types that are handled by a 'custom Type handler' (that do not create an ocdbt.process_X folder) then this warning gets thrown.

This is because PyTreeSave assumes that at least 1 ocdbt-directory-creation type handler is used to treat the collection, but this is not guaranteed..

PhilipVinc avatar Nov 14 '24 10:11 PhilipVinc

I'm seeing this warning without using custom TypeHandler, just using PyTreeSave. Is that expected?

garymm avatar Feb 13 '25 23:02 garymm

You must have use_ocdbt set to False, is that the case? Ideally we should not be logging a warning in that case since there's nothing to be concerned about.

orbax-dev avatar Feb 18 '25 22:02 orbax-dev