tidb icon indicating copy to clipboard operation
tidb copied to clipboard

statistics: use another way to merge topn

Open winoros opened this issue 2 years ago • 9 comments

What problem does this PR solve?

Issue Number: ref #50761

Problem Summary:

What is changed and how it works?

We use the property that items inside both the TopN and the histogram are ordered to speed up the process.

And use a heuristic cutting:

  • We record the avg num per distinct value of each histogram notNullCount / ndv_in_hist.
  • Then for a topn item, we can use the mentioned property to know the TopNs that contain the item.
  • Then we can easily know in which partition the item occurs in the histogram.
  • The maximum possible occurrence of this item is the sum occurrence of the affected TopNs + the avg num per distinct value of each affected histogram
  • If the maximum possible occurrence is still smaller than the currently maintained smallest global TopN item, we don't need to insert this one into the heap and test it.

In this way, the speed is improved while the CPU is saved. img_v2_3e92f289-37de-4fd4-9b33-2a35f25c9d5g The CPU usage. Previous VS This pull.

Running tool: /DATA/disk4/yiding/go/bin/go test -benchmem -run=^$ -tags intest,deadlock -bench ^BenchmarkMergePartTopN2GlobalTopNWithHists$ github.com/pingcap/tidb/pkg/statistics/handle/globalstats

goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/pkg/statistics/handle/globalstats
cpu: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
BenchmarkMergePartTopN2GlobalTopNWithHists/Size100-72         	     159	   7249785 ns/op	   34714 B/op	     113 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size1000-72        	       9	 117386869 ns/op	  100136 B/op	    1013 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size2000-72        	       4	 272247218 ns/op	  173416 B/op	    2013 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size5000-72        	       2	 787035728 ns/op	  393256 B/op	    5013 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size10000-72       	       1	1919716271 ns/op	  759656 B/op	   10013 allocs/op
PASS
ok  	github.com/pingcap/tidb/pkg/statistics/handle/globalstats	17.839s

Running tool: /DATA/disk4/yiding/go/bin/go test -benchmem -run=^$ -tags intest,deadlock -bench ^BenchmarkMergePartTopN2GlobalTopNWithHists$ github.com/pingcap/tidb/pkg/statistics/handle/globalstats

goos: linux
goarch: amd64
pkg: github.com/pingcap/tidb/pkg/statistics/handle/globalstats
cpu: Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz
BenchmarkMergePartTopN2GlobalTopNWithHists/Size100-72         	     122	  10071203 ns/op	  177982 B/op	      30 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size1000-72        	       6	 186292872 ns/op	  176418 B/op	      28 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size2000-72        	       3	 398591599 ns/op	  178269 B/op	      31 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size5000-72        	       1	1436609512 ns/op	  179432 B/op	      33 allocs/op
BenchmarkMergePartTopN2GlobalTopNWithHists/Size10000-72       	       1	4363627038 ns/op	  177336 B/op	      31 allocs/op
PASS
ok  	github.com/pingcap/tidb/pkg/statistics/handle/globalstats	18.587s
n=100,            7249785/10071203=72%
n=1000         117386869/186292872=63%
n=2000         272247218/398591599=68%
n=5000        787035728/1436609512=55%
n=10000      1919716271/4363627038=44%

The mem usage is also in an acceptable range.

Check List

Tests

  • [x] Unit test
  • [x] Integration test
  • [ ] Manual test (add detailed scripts or steps below)
  • [ ] No need to test
    • [ ] I checked and no code files have been changed.

Side effects

  • [ ] Performance regression: Consumes more CPU
  • [ ] Performance regression: Consumes more Memory
  • [ ] Breaking backward compatibility

Documentation

  • [ ] Affects user behaviors
  • [ ] Contains syntax changes
  • [ ] Contains variable changes
  • [ ] Contains experimental features
  • [ ] Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

winoros avatar Oct 18 '23 19:10 winoros

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

ti-chi-bot[bot] avatar Nov 17 '23 15:11 ti-chi-bot[bot]

Codecov Report

Attention: Patch coverage is 83.85650% with 36 lines in your changes missing coverage. Please review.

Project coverage is 61.1317%. Comparing base (a8a9a04) to head (b819a09). Report is 724 commits behind head on master.

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #47765        +/-   ##
================================================
- Coverage   70.8030%   61.1317%   -9.6714%     
================================================
  Files          1464       1792       +328     
  Lines        436234     712622    +276388     
================================================
+ Hits         308867     435638    +126771     
- Misses       108065     251746    +143681     
- Partials      19302      25238      +5936     
Flag Coverage Δ
integration 44.2336% <71.7488%> (?)
unit 74.8765% <82.5112%> (+4.2746%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 57.8032% <ø> (+3.8075%) :arrow_up:
parser ∅ <ø> (∅)
br 57.5065% <ø> (+11.6041%) :arrow_up:

codecov[bot] avatar Jan 26 '24 17:01 codecov[bot]

And some codes like the in-place updates for the heap are optimized for the memory.

winoros avatar Jan 29 '24 13:01 winoros

/retest

winoros avatar Feb 05 '24 22:02 winoros

/retest

hawkingrei avatar Feb 20 '24 07:02 hawkingrei

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hi-rustin

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot[bot] avatar Feb 22 '24 09:02 ti-chi-bot[bot]

[LGTM Timeline notifier]

Timeline:

  • 2024-02-22 09:58:29.271545344 +0000 UTC m=+523998.019168506: :ballot_box_with_check: agreed by hi-rustin.

ti-chi-bot[bot] avatar Feb 22 '24 09:02 ti-chi-bot[bot]

/test all

hawkingrei avatar Jun 09 '24 13:06 hawkingrei