Hi, please help me with a memory leak issue
Hello, below attached is the memory performance of our restate server recently, apparently there is some kind of memory leak issue
the `WorkflowSerivce > getState` function in our app will be called a lot of times, so I think this function must be related, and what i want to know is that when will the `ObjectClient` release the memory every time after it read the unique Object from db to memory, if not, 10,000 of users will cause a lot of memory consumption. Hope that I made myself clear, really need your help, thx!
// WorkflowSerivce
static getRestateClient() {
const url = WorkflowSerivce.getRestateServerUrl();
return client.connect({
url: url,
});
}
async getState(stateOwnerId: string, schema: MachineSchema) {
const restateClient = WorkflowSerivce.getRestateClient();
const ownerKey = MachineSchema.getUniqueOjectKey(
schema.getProduct(),
schema.getName(),
stateOwnerId
);
const stateOwnerClient = restateClient.objectClient(
StateOwnerObject,
ownerKey
);
return await stateOwnerClient.getState();
}
// StateOwnerObject
async getState(ctx: restate.ObjectContext): Promise<StateOwnerSnapshot> {
return ctx.get(STATE_KEY);
}
Hi @Nomia, the memory consumption you are showing is the memory of the restate-server, right? Did you configure RESTATE_ROCKSDB_TOTAL_MEMORY_SIZE to be 2GB? When hitting the 2GB limit, will the restate-server crash?
The observed behavior is most likely caused by how RocksDB (which we are using for internal storage) works. RocksDB will accumulate changes in memory until a memtable becomes full or the overall memory limit has been reached. In these cases, the memtables will be flushed to disk and this frees memory up. The memory usage would then always show a saw like pattern that you are reporting. This is per so not a problem if you configured the total memory size of RocksDB to be roughly 75% of the available memory to the Restate process.
If you are experiencing that restate-server gets OOM killed, then please check how much memory you've configured Restate with and how much memory is available on the machine you are running it on (or how much memory a pod requests). If the available memory is lower than the configured memory, then this can cause the process to get killed.
thx for the detailed explanation @tillrohrmann , we will look into this next week.