meshed-memory-transformer icon indicating copy to clipboard operation
meshed-memory-transformer copied to clipboard

Memory issue during RL optimization

Open gpantaz opened this issue 3 years ago • 3 comments

Hello,

Many thanks for releasing the repo. I am trying to train a model on a custom variation of MSCOCO though I keep the train/test/valid sizes equal to the Karpathy split. I have no issue training a model without RL optimization. However, I have noticed that during each epoch in RL optimization the required memory increases. I am training the model on RTX-2080. Each epoch lasts approximately 3-4 hours and occasionally run out of memory. I tried to see if there are any additional accumulated allocations from epoch to epoch. Is this expected?

Thank you :)

gpantaz avatar Jun 21 '21 16:06 gpantaz

I meet the same problem, have you ever solved this issue? Could you please tell me how to overcome this?

amazingYX avatar Jan 11 '22 02:01 amazingYX

Hello, sadly no. I was allocating my resources on different experiments to speed up the process. I ended up running 1 experiment at a time :/

gpantaz avatar Jan 14 '22 11:01 gpantaz

try to add tokenizer_pool.close() at the end of the function train_scst

luo3300612 avatar Jul 18 '22 04:07 luo3300612