nautilus_trader
nautilus_trader copied to clipboard
Decide and enforce a data retention policy for the Cache
Feature Request
To my understanding, all of the events that happen to the cache to build up state are saved. They are then replayed on startup in reconciliation to rebuild state. For continuous trading, it is important to at some point start forgetting trades, orders, flat positions.
I would suggest something along the lines of:
- Forget flat positions - unless they hold some useful info
- Forget trades that led to a flat position
- Forget orders that are now closed
- I'm not sure if commands are saved somewhere (nor why) but forget those after some time too
Things to consider:
Much of these items would still be useful for post trade analysis, the request is only that the live trading client cache doesn't remember all these things, they can still be stored in the database just not loaded up on startup. Perhaps they could be moved to a different key like prepending the key with HISTORICAL: vs LIVE:
For each of these things, it might still be worth having them around for some time, someone might have use cases for "what are my trades in the last N minutes" or "let's send a maker order to the order book that has historically had the most successful maker fills"
So instead of strict rules policy like the above, maybe every item should be stored for at least N minutes. This could be simpler as well.
I like this direction, and the functionality is definitely needed.
I believe the main use case here is for continuous trading, we want to avoid an unbounded increase in memory consumption as more and more objects / events are held in the cache.
I don't think we should immediately forget/drop out of memory those objects you've listed above though. The information is too useful to query for all kinds of operations, and the space required in memory is not that high, and there would be some overhead bringing 'forgotten' objects back (even from Redis).
I think there should be a configurable max lookback though, and objects in the cache which fall outside that are then automatically 'trimmed' out of memory. Do we need configuration per object type though? that may not be wise as some objects rely on others being available for correct state.
So were you thinking a strict max look-back applied on the database side? How would this work for dependent objects such as the events that are part of a position or for positions that are not flat but very old.
Might this follow a "garbage collection" pattern?
Would it be possible to observe subscriptions within the data engine coming from indicators and actors for active references, to understand what the dependencies are?
My assumption is that indicators are maintaining internal state to accomplish their work, and don't need the cache to persist much historical data beyond what is "demanded" in from catalog or other adapters, and then only until consumed. My further assumption would be that for continuously operating systems, indicator dehydration/rehydration would be implemented, in order to recover in the event of an outage. This would go hand-in-hand with other types of state-persistence which would be needed to recover in the event of an outage (e.g. portfolio history)
So were you thinking a strict max look-back applied on the database side? How would this work for dependent objects such as the events that are part of a position or for positions that are not flat but very old.
So all of the market data right now is held in dequeues with a maxlen, so thats a non issue, so I think the types with the most growth in the cache are (all of them are related):
-
Order
-
OrderEvent
and all subclasses -
Position
So your idea of trimming based on positions and their state could work well. It could be a configurable max lookback based on flat positions, if the position is flat and outside the window, then the position.client_order_ids
could be used to find all associated orders, and drop these and the position out of the caches memory (along with cleaning up the internal indexes). The orders and positions should be the only objects holding onto references to events, so these would fall out of scope and be garbage collected also.
I don't think this max lookback should be purely for the database, because we need to solve the unbounded increase in the memory being used by the cache too. There's an argument the above behavior should only be applied to the cache, and that any database cleanup happen through some out of band process external to Nautilus, based on a users own devops needs and infra capacity.
Nearly any database could be used to back the cache if someone was so included to implement the interface, it just so happens that the current impl we have using Redis also requires memory. So to completely solve the issue, this trimming strategy probably needs to extend to the database (but not exclusively be applied at the database level).