erigon icon indicating copy to clipboard operation
erigon copied to clipboard

[test] bplus index inefficient warmup

Open awskii opened this issue 1 year ago • 8 comments

2 bugs inside and outside bpsTree.traverse:

  • depth of the tree was always 0 which created flat PARTIALLY ordered list of keys. Also 2 useless keys (first and second) were permanently added into cache.
  • bug in iteration caused less node to be inserted in cache, so we did more random reads to files.

If set cacheEvery to 4 or 8 it would take each x element into cache. if =2 then ~0.8% of keys are in cache. cacheEvery=8 keeps ~3.3% of keys in memory and slightly decreases amount of cow and splits in db, and i think this trade of doesnt worth it. Feels like having several layers of lists in cache does not increase speed but increases logic complexity

amount of cow and split reduced by 15%, spills and wops goes to 0 after update but increases amount of pagefaults i during bs seek. image

This change does not require creation of new indices, but seriously affects structure of in-mem cache. Causes less random reads as well.

awskii avatar Jun 24 '24 18:06 awskii

need remove printf

AskAlexSharov avatar Jun 26 '24 02:06 AskAlexSharov

i don't understand how to read your pictures. what is on the left and what is on the right? (if left - before your fix, and right - after. then after fix commit gets lower and amount of pageFaults increased? so we getting worse?)

please double-check that your pictures are not affected by https://github.com/ledgerwatch/erigon/pull/10869

AskAlexSharov avatar Jun 26 '24 02:06 AskAlexSharov

I did try main vs this branch (cacheEvery=2) vs this branch (cacheEvery=8) - print_stages: alloc=13.9GB vs alloc=23.4GB vs alloc=23.4GB vs alloc=51.0GB (bor-mainnet)

cacheEvery=8 - startup time is very slow

AskAlexSharov avatar Jun 26 '24 03:06 AskAlexSharov

cacheEvery needs comment about max/min possible value. (like can we set 32? 1?)

AskAlexSharov avatar Jun 26 '24 03:06 AskAlexSharov

i suppose to use cacheEvery=2 as we thought we use before. Going to distinct if not affected by #10869 today.

awskii avatar Jun 26 '24 06:06 awskii

@awskii do you have link to board? i wanna take a look at "txs/sec" over big "rate interval"

AskAlexSharov avatar Jun 26 '24 12:06 AskAlexSharov

also i don't understand - how database commit (not commitment) speed can depend on .bt keys cache.

AskAlexSharov avatar Jun 26 '24 12:06 AskAlexSharov

confirmed that whole perf test here is screwed and related to #10869 and other recent updates to `main. Clean test later on.

awskii avatar Jun 28 '24 00:06 awskii

I added last-mile optimisation: if R-L > 8 - scan one by one. Picked M for now is 256: optimal usage of memory and access time.

awskii avatar Jul 25 '24 21:07 awskii

i broke a prune yet still it's good solution. will fix in the morning.

awskii avatar Jul 25 '24 23:07 awskii

N: 12000000, M: 256 skip since shard <= 4
BenchmarkBpsTreeSeek-12    	 1236522	       986.5 ns/op	     883 B/op	      12 allocs/op

N: 12000000, M: 256 skip since shard <= 64
BenchmarkBpsTreeSeek-12    	  426034	      2708 ns/op	    2577 B/op	      38 allocs/op

N: 12000000, M: 256 skip since shard <= 16
BenchmarkBpsTreeSeek-12    	  805736	      1277 ns/op	    1154 B/op	      16 allocs/op

N: 12000000, M: 256 skip since shard <= 1 (means do not scan even on small ranges)
BenchmarkBpsTreeSeek-12    	 1214692	       967.4 ns/op	     866 B/op	      12 allocs/op
PASS

well, usefulness of that skip feature is not that clear but it makes less random reads which may be helpful as well. I set constant DefaultBtreeStartSkip=4 for now.

for NVMe disks change is barely noticable but for slower disks difference could be bigger. Flatten structure of initial array of cached nodes makes 30% less comparisons than levelled structure does.

awskii avatar Jul 26 '24 18:07 awskii