opa icon indicating copy to clipboard operation
opa copied to clipboard

Allow for adding objects to storage without round-tripping through JSON

Open willbeason opened this issue 3 years ago • 14 comments

What part of OPA would you like to see improved?

Motivating issue: https://github.com/open-policy-agent/gatekeeper/issues/2060

When adding data to the in-memory OPA storage object, inmem.store.Write round-trips the incoming object through JSON to ensure the object may be validly added to OPA storage. Adding an object which cannot be serialized to JSON could cause all sorts of misbehaviors when running OPA policies.

From inmem.store.Write:

func (db *store) Write(_ context.Context, txn storage.Transaction, op storage.PatchOp, path storage.Path, value interface{}) error {
	underlying, err := db.underlying(txn)
	if err != nil {
		return err
	}
	val := util.Reference(value)
	if err := util.RoundTrip(val); err != nil { // <-- this function
		return err
	}
	return underlying.Write(op, path, *val)
}

Unfortunately, this round-tripping is very slow compared to the rest of what needs to be done to store/update the new data.

Benchmark with round-tripping:

~/frameworks/constraint$ go test ./pkg/client/drivers/local/ --run=NONE --bench=. --benchtime=10s --cpuprofile=cpu.out
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkStorages_AddData-16    	   29972	    334523 ns/op
PASS
ok  	github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local	17.951s

Benchmark without round-tripping (with the check commented out):

~/frameworks/constraint$ go test ./pkg/client/drivers/local/ --run=NONE --bench=. --benchtime=10s --cpuprofile=cpu.out
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkStorages_AddData-16    	 4121306	      2765 ns/op
PASS
ok  	github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local	15.388s

image

In the benchmark >99% of the time spend adding new objects is in round-tripping through JSON.

Describe the ideal solution

Ideally, there would be some way for callers to add data to inmem.store without needing to do this round-tripping. Callers may be able to do this validation themselves more cheaply, or are operating in an environment where all objects by definition are serializable to JSON (e.g. the incoming request was JSON, and that is the object to be put into OPA storage) and so this validation is unnecessary.

This option could take many forms:

  • An option on inmem.store at creation time to not round-trip objects through JSON
  • An option on Write() to skip this check
  • A separate function WriteNoRoundTrip() which is the same as Write() but does not include the check

Additional Context

Per the motivating bug - the issue which (at least) one user of Gatekeeper is having is that the speed of these writes means that it takes a long time for queries to run since it takes too long to acquire a read lock for the Storage object (which is required for Query to run).

willbeason avatar May 25 '22 18:05 willbeason

The benchmark I'm using: https://gist.github.com/willbeason/9c8e2a570baeeee946f9f9c7fd26359e

willbeason avatar May 25 '22 18:05 willbeason

For callers who have objects with a DeepCopy-like method defined, the savings are ~90% instead of ~99% if the caller is not able to guarantee the object will remain unmutated and so need to deep copy the object before calling inmem.store.Write:

~/frameworks/constraint$ go test ./pkg/client/drivers/local/ --run=NONE --bench=. --benchtime=10s --cpuprofile=cpu.out
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkStorages_AddData-16    	  334730	     33309 ns/op
PASS
ok  	github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local	12.077s

(This benchmark is for a function that adds a .DeepCopy() to the object added to Rego storage before passing to the modified form of inmem.store.Write which doesn't round-trip through JSON)

willbeason avatar May 25 '22 18:05 willbeason

Thinking further a "good enough" solution for us would be to do the round-tripping validation without needing to acquire the write lock on the storage. That would enable both the validation and reduce the time that the lock needs to be held by 99%.

The implementation for that might be more complex, though.

willbeason avatar May 25 '22 18:05 willbeason

IIRC we're doing the roundtrip to support feeding arbitrary structs into the storage and having them converted to native golang types (so, a struct { Foo, Bar string } becomes a map[string]interface{} with keys "Foo" and "Bar", unless there are json struct tags). It seemed like a good idea at the time to do it like this.

Another approach to address the problem would be to make util.RoundTrip cleverer: traverse the passed-in element's structure, only to the JSON marshal/unmarshal dance for leaves that are not native types. At the core, I'd imagine something like

switch x.(type) {
    case string, int, float: // ... more native types
         // nothing
    case map[string]interface{}, []interface{}:
         // recurse for keys
    default: // must be struct
         // do the json roundtrip
}

If what comes in is a native type, I don't think (but we should verify) that the overhead would be as bad as what we have now.

I suppose that way, we'd improve the situations also for those folks unable to disable the roundtrip on write (#4709), namely OPA-as-server users. (Update: Uhr realized that this point is moot, the server runtime could disable the roundtrip, too, as there's no need for it.)

srenatus avatar May 26 '22 18:05 srenatus

This ticket is also interesting on the topic of JSON serialization/deserialization performance. Although obviously, nothing beats skipping it in the first place :) Do you have any experience working with that library, @willbeason? Would be interesting to see what difference it makes to the numbers you presented!

anderseknert avatar May 27 '22 09:05 anderseknert

The other side effect you get of round-tripping through JSON is an implicit deepcopy of the entire struct. So if we only round-tripped subfields which were not native JSON types, we would no longer be deepcopying objects put into OPA storage.

(The rest of this exclusively deals with JSON-native types)

My guess is what srenatus@ proposes would have similar performance gains to the DeepCopy. Whether we actually copy the data into a new object likely has little CPU performance impact (compared to not doing it but still traversing the tree) - but of course you'd just be holding on to more memory after. In either case, it's much faster/cheaper to do this than passing everything through JSON.

willbeason avatar May 27 '22 16:05 willbeason

From looking at open PRs and Issues - unfortunately json-iterator doesn't look like it is actively maintained.

willbeason avatar May 27 '22 16:05 willbeason

Confirmed it is ~10x faster to do what @srenatus suggests:

~/frameworks/constraint$ go test ./pkg/client/drivers/local/ --run=NONE --bench=.
goos: linux
goarch: amd64
pkg: github.com/open-policy-agent/frameworks/constraint/pkg/client/drivers/local
cpu: Intel(R) Xeon(R) CPU @ 2.20GHz
BenchmarkStorages_AddData/structured-16         	    2695	    385944 ns/op
BenchmarkStorages_AddData/jsonified-16          	   35253	     32523 ns/op

willbeason avatar May 27 '22 17:05 willbeason

With the JSONify deepcopy, it is only 4x faster. It is possible that won't fully resolve our issue so I'll send a follow-up to reduce the time the write-lock is held.

willbeason avatar May 27 '22 19:05 willbeason

Thanks @willbeason, I had not checked that library since the ticket was created. Not sure to what extent it matters, unless there's some known issue that would e.g. be different from what we have today. I mean, the JSON library in the stdlib is probably not seeing a ton of updates either. Definitely agree that it would be nicer to see it actively maintained though.

anderseknert avatar May 27 '22 20:05 anderseknert

This issue has been automatically marked as inactive because it has not had any activity in the last 30 days.

stale[bot] avatar Jun 26 '22 20:06 stale[bot]

@srenatus it looks like the solution we're going with here is #4709 . You mentioned adopting the change on the next release cycle. Do you know what the timeline for that is?

maxsmythe avatar Aug 03 '22 23:08 maxsmythe

It didn't make it for 0.43.0, but I haven't forgotten about it. Let's aim for 0.44.0.

srenatus avatar Aug 04 '22 07:08 srenatus

Thanks!

maxsmythe avatar Aug 06 '22 01:08 maxsmythe

@maxsmythe If no unexpected regressions come up, this is expected to come out with 0.44.0.

srenatus avatar Aug 18 '22 06:08 srenatus

Thanks!

maxsmythe avatar Aug 18 '22 23:08 maxsmythe