DeepRec
DeepRec copied to clipboard
[OSPP 2022] DeepRec supports exporting models to key-value NoSQL databases
Motivation Currently, DeepRec supports exporting models to the checkpoint, but when the model weight file is large, the model import and export performance will be affected. Key-value NoSQL databases (such as LevelDB, Redis, and RocksDB) have the advantages of high performance, high scalability, and support for large data volume. We add this feature to optimize the model import and export performance while supporting the storage needs of more users.
Design To achieve better import and export performance, we add new ops, which avoid repeated reading and writing of model files to disk by directly manipulating the database, thus reducing time overhead.
The overall design can be divided into three parts.
The first part is the implementation of a generic interface for persisting key-value data in a database, which is used to support persistence in a key-value database.
The second part is to add an op implementation in the op kernel to import and export models. This op saves the Variable/EmbeddingVariable values in memory directly to the database through database calls or loads the models directly from the database.
The third part is to add the op in the process of building the graph.
In the traditional checkpoint saving method, the BundleEntryProto storage format is used to correspond to the file. In the database, we have simplified this step by adding key-value mappings such as node key lists. In addition, in distributed training, ps is responsible for parameter updating. Except for StringJoin, save/ShardedFilename/shard, and save/num_shards, ops in the saving process are executed on ps. So the model preservation process only needs to consider the ps side. When the data is too large, the save op can be placed on each device with the shared parameter, so the meta information from different devices needs to be merged to form a complete checkpoint and we need to rewrite this process.
Additional. To facilitate the user to view the parameters, we also plan to implement a file viewer that can view the Variable/EmbeddingVariable values and support searching for the values.