opa icon indicating copy to clipboard operation
opa copied to clipboard

Investigate storing AST values in in-memory store instead of maps/slices/etc.

Open tsandall opened this issue 3 years ago • 9 comments
trafficstars

Currently the in-memory store uses maps and slices to store base documents. In some cases, this can result in high latency during policy evaluation if a large amount of data has to be read and evaluated (because the evaluator has to convert the interface{} types into AST values.) We should run some experiments on different data sets to measure the performance improve we'd receive by simply storing AST values inside the in-memory store. We will need to monitor the memory usage because the AST values have fixed cost top of the standard map/slice types.

tsandall avatar Dec 15 '21 22:12 tsandall

:+1:

schneefisch avatar Dec 16 '21 09:12 schneefisch

Also on top, the transaction’s Read method does a deep copy of maps at least

mjungsbluth avatar Dec 16 '21 17:12 mjungsbluth

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

stale[bot] avatar Jan 15 '22 18:01 stale[bot]

From looking at the code the evaluation code can already handle AST Values returned from storage

Instead of rewriting large parts, would it work to implement the AST Value interfaces for maps and arrays in either a copy on write or read only fashion, backed by efficient golang maps? I am not sure if the using code places actually try to write to data that is originating in storage.

My assumption is that large data blocks need to be optimized for lookup speed anyways which will cut down the expected data size after each lookup step anyways.

mjungsbluth avatar Feb 09 '22 19:02 mjungsbluth

@mjungsbluth Read() only does a deep copy if it's a write transaction. Policy queries are read-only so no deep copy is performed inside the in-memory store.

My thinking for this issue was that we could simply have the server convert data into AST types before inserting into the store. This would apply to the bundle plugin as well as writes on the v1/data API. This way there would be no need for interface{} to ast.Value conversion during eval.

tsandall avatar Feb 10 '22 16:02 tsandall

Overlooked that deepcopy only happens on read…

Would this approach have the potential to break storage implementations? Otherwise this sounds like a fair approach to try.

@tsandall you were also concerned about memory consumption but since the conversion via the ast.InterfaceToValue method happens anyways already, this would at least make memory usage more predictable or am I wrong?

mjungsbluth avatar Feb 10 '22 19:02 mjungsbluth

Would this approach have the potential to break storage implementations? Otherwise this sounds like a fair approach to try.

It would--but we can workaround that in OPA by just checking the type of the store... something like:

package inmem

func Is(s storage.Store) bool {
  _, ok := s.(*store)
  return ok
}

With this in place, the server handler and bundle plugin can ask if the store they're about to transact on is the in-memory one...

@tsandall you were also concerned about memory consumption but since the conversion via the ast.InterfaceToValue method happens anyways already, this would at least make memory usage more predictable or am I wrong?

I think it'll depend on the access pattern. Today, if you load a bunch of external data into OPA it's stored as maps/slices/etc. With this change, we'd still be using maps/slices/etc. but there's the additional overhead of the ast.Value types. If the policy queries load the entirety of the store on every query, then yes, it would make the memory usage more predictable, but if that doesn't happen, then we'd be paying the price all the time.

Which begs the question, perhaps the inmem store should just convert to the interface{} values into AST values lazily... we already do this in the evaluator but the problem is that we don't have a place to store them across queries. I'm not sure how much work that would be--and it would only be valuable if the steady-state memory usage was significantly higher.

😅 this is why I wrote "investigate storing ..." because it requires some investigation/experimentation to determine the best path forward. Hope this helps!

tsandall avatar Feb 11 '22 16:02 tsandall

@srenatus @anderseknert I'm adding this into the backlog because I think it's worth at least exploring/experimenting with to see how much of a win it could be... it's not immediately obvious how many repercussions there could be on different parts of OPA--anything reading out of the store would be impacted if the inmem implementation just started returning AST values all of a sudden. That said, I wonder if we could introduce this gradually so that store users could opt-in to receiving AST values back from the store. The store could then either return the AST or convert it into interface{} for the caller. This way we wouldn't break anything, but topdown would still benefit from the speedup. Thoughts?

tsandall avatar Feb 19 '22 15:02 tsandall

@tsandall if you want to see what a bored data plane with a single core does on bundle activation https://github.com/open-policy-agent/opa/issues/6533#issue-2088896905. The 10m schedule time was by mistake, because control plane created a json with an updated timestamp every 10m. The flame graph shows clearly the DeepCopy as bottleneck of inmemory storage. A list of structs would be much faster and if you use sync.Pool you might be able to speed up allocation even more.

szuecs avatar Jan 24 '24 16:01 szuecs