openbao icon indicating copy to clipboard operation
openbao copied to clipboard

Support clearing views via pagination

Open cipherboy opened this issue 10 months ago • 1 comments

I think we need a performance comparison before adopting this. However, this might help address the comments in https://github.com/orgs/openbao/discussions/966: rather than loading the full paths to all 150k secrets, we can load only a page (7.5k secrets) at a time, which should help use less memory.

Benchmarks:

[cipherboy@x1c openbao]$ ogt -bench=. -benchmem -benchtime=150000x -run=Bench ./.../physical/crosstest 
/home/cipherboy/GitHub/cipherboy/openbao
goos: linux
goarch: amd64
pkg: github.com/openbao/openbao/physical/crosstest
cpu: 13th Gen Intel(R) Core(TM) i7-1370P
BenchmarkClearView/Inmem/WithoutPagination-20         	  150000	       900.7 ns/op	     359 B/op	       1 allocs/op
BenchmarkClearView/Inmem/WithPagination-20            	  150000	       968.1 ns/op	     529 B/op	       2 allocs/op
BenchmarkClearView/Raft/WithoutPagination-20          	  150000	     43155 ns/op	   54959 B/op	     370 allocs/op
BenchmarkClearView/Raft/WithPagination-20             	  150000	     45413 ns/op	   55902 B/op	     372 allocs/op

Performance looks about the same.

cipherboy avatar Mar 08 '25 14:03 cipherboy

@JanMa This should be ready for review.

cipherboy avatar Mar 20 '25 01:03 cipherboy

Interesting, so the performance is about the same with pagination. Taken into account other places where pagination is used, is this expected?

Gabrielopesantos avatar May 18 '25 20:05 Gabrielopesantos

@Gabrielopesantos The performance we're hoping to improve here isn't time, it is memory. There'll be a little more churn since we're making more function calls (which incur compute & memory), but we don't keep as much in memory at one time. If we have 5 million entries, a single List will store all entries in memory at once, whereas here we might make 1k ListPage calls (each 5k entries returned), but have smaller overhead in memory at any time.

Go doesn't have a good way to measure maximum memory usage sadly, and it'd be dependent on when the GC runs.

cipherboy avatar May 18 '25 21:05 cipherboy

As the issue linked mentions the deletion timing out I somewhat assumed the idea was improve the overall time.

In the event that the sever has very small amounts of memory but large amounts of swap, or lots of parallel deletion operations, this would lead to very large deletion time caused by constant swapping. Using pagination will improve the overall time in that event. :-)

But if there's just a lot of stuff, nothing we can do (in this change) will improve that. Changes like the draft gist for storage partitions (https://gist.github.com/cipherboy/92828019e89d8adde219d08dec47e5a9) would eventually let you drop a whole tree, much faster and much more easily.

cipherboy avatar May 19 '25 21:05 cipherboy

I have updated this to also acquire a transaction during pagination, if available. While this doesn't guarantee nothing has changed (as each pagination call adds to a write transaction, which will bloat the size of it), it does at least guarantee consistency of the scan like previously. So semantics are overall completely unchanged.

cipherboy avatar May 23 '25 19:05 cipherboy

@wslabosz-reply Do you see a failure in the tests? I've rerun three times and each time it says job 7 exited with code 143, but I do not see why. Is there a memory problem here perhaps?

cipherboy avatar May 26 '25 16:05 cipherboy

Per review from @wslabosz-reply, future maintainer of /vault, merging.

cipherboy avatar May 28 '25 00:05 cipherboy