BB2 Agent's memory persistence on GPU even after calling `remove_agent`
I am trying to optimize my GPU's memory to able to have as many agents running at possible (each agent is a BB2 3B clone). I have noticed that each agent takes about 4/7Mb of memory on the GPU (NVIDIA T4). I regularly call ChatServiceManager.remove_agent to clear up the unused agents or the ones I want to intentionally delete but the GPU memory doesn't free up (even after garbage collection).
Do you have any idea why this is the case?
Thank you!
Are you using agent.clone? Because if so, significant amount of memory is saved between the agents and only an small amount of memory is used for keeping track of the state of the agent (which is separate from the model size).
Hi, I am using clone. I am currently able to run a few hundreds agents however I would still like to be able to free up the GPU memory once one of those worlds is closed. I am plotting GPU consumption and I don't see any memory being freed up.
have you run into an issue where you actually OOM because of this? and is it a consistent additional 4-7mb even after removing other agents?
Yes, it OOMs at around 500 agents (model itself takes about 9GiB and running on a T4 with approximately 16GiB). And yes, the 4-7Mb does not get cleared out. Below I have the GPU consumption based on a load test I did. The second plot is with more aggressive garbage collection, the first one is by removing the agent only. I am trying to maximize or free up as much memory as possible due to infrastructure costs.
I've confirmed on my end that just calling agent.clone() successively does not add any additional CUDA memory allocation. Are you saving things in the world that are taking up GPU mem?
This issue has not had activity in 30 days. Please feel free to reopen if you have more issues. You may apply the "never-stale" tag to prevent this from happening.