mongodb-d4
mongodb-d4 copied to clipboard
Performance decreasing for denormalization
trafficstars
Here are two designs for TATP benchmark:
Design 1
[00] ACCESS_INFO
denorm: None
shardKeys: [u's_id']
[01] CALL_FORWARDING
denorm: None
shardKeys: [u's_id']
[02] SPECIAL_FACILITY
denorm: None
shardKeys: [u's_id']
[03] SUBSCRIBER
denorm: None
shardKeys: [u's_id']
Design 2
[00] ACCESS_INFO
denorm: None
shardKeys: [u's_id', u'ai_type']
[01] CALL_FORWARDING
denorm: SPECIAL_FACILITY
shardKeys: []
[02] SPECIAL_FACILITY
denorm: SUBSCRIBER
shardKeys: []
[03] SUBSCRIBER
denorm: None
shardKeys: [u's_id']
In our current cost model, Design 2 has lower cost than Design 1. However, the replay framework indicates Design 1 has much higher throughput than Design 2. Below are replay framework's results:
Design 1
--------------------------------------------------------------------------
Executed Total Time (ms) Rate
Replay Queries 993561 - 100.0% 1379015.66553 11288.49 op/s
--------------------------------------------------------------------------
TOTAL 993561 119900.106192 8286.57 op/s
==========================================================================
Latency Report
--------------------------------------------------------------------------
Queries(%) Latency(ms)
10.0% 0.1020
20.0% 0.5219
50.0% 0.7350
80.0% 1.1170
90.0% 1.5299
99.9% 22.9671
--------------------------------------------------------------------------
=============================================================================
Top 20 Slowest Operations
-----------------------------------------------------------------------------
# Latency(ms) Session Id Operation Id Type Collection
0 855.7951 123 0 $query SUBSCRIBER
1 855.0751 530 1 $query SUBSCRIBER
2 854.6751 1715 1 $query SUBSCRIBER
3 854.6152 1853 0 $query SUBSCRIBER
4 854.6140 1216 0 $query SPECIAL_FACILITY
5 854.5260 1601 0 $query ACCESS_INFO
6 854.4500 1148 1 $query SUBSCRIBER
7 854.3968 1548 0 $query SUBSCRIBER
8 854.3952 364 0 $query ACCESS_INFO
9 854.2671 116 0 $query SUBSCRIBER
10 854.2390 1139 0 $query ACCESS_INFO
11 854.2109 1651 0 $query SUBSCRIBER
12 854.1081 1891 1 $query CALL_FORWARDING
13 853.9870 1890 0 $query SPECIAL_FACILITY
14 853.9579 1262 0 $query SPECIAL_FACILITY
15 853.9290 1196 0 $query ACCESS_INFO
16 853.8289 1536 0 $query ACCESS_INFO
17 853.8220 781 0 $query SUBSCRIBER
18 853.8101 882 0 $query ACCESS_INFO
19 853.7490 71 0 $query SUBSCRIBER
-----------------------------------------------------------------------------
Design 2
--------------------------------------------------------------------------
Executed Total Time (ms) Rate
Replay Queries 114321 - 100.0% 1436510.32448 1299.82 op/s
--------------------------------------------------------------------------
TOTAL 114321 120245.948076 950.73 op/s
==========================================================================
Latency Report
--------------------------------------------------------------------------
Queries(%) Latency(ms)
10.0% 0.0801
20.0% 0.4001
50.0% 0.5720
80.0% 0.6790
90.0% 0.7560
99.9% 515.1670
--------------------------------------------------------------------------
==========================================================================
Top 20 Slowest Operations
--------------------------------------------------------------------------
# Latency(ms) Session Id Operation Id Type Collection
0 556.4971 157 0 $query SUBSCRIBER
1 556.2160 157 0 $query SUBSCRIBER
2 551.5430 270 0 $query ACCESS_INFO
3 545.9940 725 0 $query SUBSCRIBER
4 545.8341 1170 0 $query ACCESS_INFO
5 545.4969 117 0 $query SUBSCRIBER
6 545.3651 1170 0 $query ACCESS_INFO
7 544.0409 270 0 $query ACCESS_INFO
8 543.6971 997 0 $query SUBSCRIBER
9 543.6630 16 0 $query SUBSCRIBER
10 540.9229 40 0 $query ACCESS_INFO
11 540.8962 157 0 $query SUBSCRIBER
12 540.5340 40 0 $query ACCESS_INFO
13 540.5271 40 0 $query ACCESS_INFO
14 540.2842 187 0 $query SUBSCRIBER
15 540.1580 187 0 $query SUBSCRIBER
16 540.0901 187 0 $query SUBSCRIBER
17 539.8049 40 0 $query ACCESS_INFO
18 539.7899 40 0 $query ACCESS_INFO
19 539.6461 40 0 $query ACCESS_INFO
--------------------------------------------------------------------------
Therefore, we need to figure out, why denormalization decreases the throughput so much and then adjust our cost models.