ray
ray copied to clipboard
rewrite the checkpointing code to dump 1 file per process instead of many files, use large offsets
format:
magicNumber (64 bits) format (32 bits) kmerLength (32 bits) numberOfMPIRanks (32 bits) sections (32 bits) <sectionKey><sectionOffset (64 bits)><bytes in container> <sectionKey><sectionOffset (64 bits)><bytes in container> <sectionKey><sectionOffset (64 bits)><bytes in container> <sectionKey><sectionOffset (64 bits)><bytes in container> <sectionKey><sectionOffset (64 bits)><bytes in container> <sectionContent> <sectionContent> <sectionContent> <sectionContent> <sectionContent>
sectionKey will be handles for checkpoints, in the source code, these will be associated to string names.
will need a Ray option to manage checkpoints
like:
Ray manage-checkpoints list checkpointDirectory -----> list checkpoints Ray manage-checkpoints remove checkpointDirectory checkpointName -----> remove a checkpoint