rl
rl copied to clipboard
[Feature Request] Use read-write lock instead of global lock when sampling is non-mutating
Motivation
The use of a global threading.RLock in replay buffer prototypes limits the number of concurrent reads that can be performed on a replay buffer unnecessarily when contents are in shared memory. Using a global reentrant lock for synchronising access to the replay buffer can cause more contention than necessary. Concurrent reads are permissable for many replay buffer/storage types where sampling does not mutate the underlying buffer, and can improve performance in distributed RL tasks.
Solution
Switching to using a read-write lock resolves this by allowing multiple concurrent reads (buffer samples), making sampling blocking only if the write lock (e.g. when inserting into the replay buffer or updating priorities) is acquired.
Alternatives
We could move synchronisation strategy into the Sampler injected into the replay buffer as a constructor argument (https://github.com/pytorch/rl/blob/main/torchrl/data/replay_buffers/rb_prototype.py#L38) rather than including it in the sample method of the replay buffer itself. This way we could introduce an ConcurrentSampler for example where the synchronisation strategy (if any) is within the domain of the sampler.
Additional context
https://github.com/pytorch/rl/blob/main/torchrl/data/replay_buffers/rb_prototype.py
Not all replay buffers should use read-write locks - this applies only for when sampling is a non-mutating function.
Checklist
- [x] I have checked that there is no similar issue in the repo (required)
cc @xiaomengy
This was more than a year ago but we should fix this! @xiaomengy do you have a suggestion? Would you like to help with this?
Yes, I'm happy to help on this. But I'm very busy this week. May I take a look in next week?