apisix icon indicating copy to clipboard operation
apisix copied to clipboard

apisix delay

Open zhixiang0608 opened this issue 4 years ago • 13 comments

Issue description

hello, Gateway delay p99 is much higher than other indicators image This is the result of my pressure test using wrk image My routing configuration is 2{7W( ~M4C27% JL`28@VLP

Environment

apisix vesion:apisix-2.6

zhixiang0608 avatar Sep 23 '21 02:09 zhixiang0608

can you paste the route configuration of zyml-getuserinfo-api as text?

tzssangglass avatar Sep 23 '21 03:09 tzssangglass

{"id":"373703982817739720","create_time":1632274531,"update_time":1632274531,"uris":["/zymk-getuserinfo-api/"],"name":"zymk-getuserinfo-api","methods":["GET","POST","PUT","DELETE","PATCH","HEAD","OPTIONS","CONNECT","TRACE"],"plugins":{"prometheus":{"disable":false,"prefer_name":true},"proxy-rewrite":{"headers":{"X-Api-Engine":"apisix","X-Api-Version":"v1"},"regex_uri":["^/zymk-getuserinfo-api/(.)","/$1$2$3"]}},"upstream":{"nodes":[{"host":"172.21.24.51","port":7202,"weight":1}],"timeout":{"connect":6,"read":6,"send":6},"type":"roundrobin","scheme":"http","pass_host":"pass"},"status":1}

zhixiang0608 avatar Sep 23 '21 05:09 zhixiang0608

@zhixiang0608 Have you ever tried to benchmark the upstream directly? It's unknown whether the long tail requests are due to upstream or APISIX itself.

tokers avatar Sep 23 '21 10:09 tokers

My reproduction steps are as follows

  1. branch: release/2.6

    case 1.1: route configuration:

    {
        "uris": [
            "/zymk-getuserinfo-api/"
        ],
        "plugins": {
            "proxy-rewrite": {
                "headers": {
                    "X-Api-Engine": "apisix",
                    "X-Api-Version": "v1"
                },
                "regex_uri": [
                    "^/zymk-getuserinfo-api/(.)",
                    "/$1$2$3"
                ]
            }
        },
        "upstream": {
            "nodes": {
                "127.0.0.1:1980": 1
            },
            "type": "roundrobin"
        }
    }
    

    benchmark:

    wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
    Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      4 threads and 60 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.20ms  755.26us  13.91ms   80.72%
        Req/Sec        nan       nan   0.00      0.00%
      Latency Distribution (HdrHistogram - Recorded Latency)
     50.000%    1.14ms
     75.000%    1.56ms
     90.000%    1.93ms
     99.000%    2.75ms
     99.900%   10.99ms
     99.990%   13.77ms
     99.999%   13.92ms
    100.000%   13.92ms
    

    case 1.2: route configuration:

    {
        "uris": [
            "/zymk-getuserinfo-api/"
        ],
        "upstream": {
            "nodes": {
                "127.0.0.1:1980": 1
            },
            "type": "roundrobin"
        }
    }
    

    benchmark:

    wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
    Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      4 threads and 60 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.22ms  563.10us   9.66ms   67.77%
        Req/Sec        nan       nan   0.00      0.00%
      Latency Distribution (HdrHistogram - Recorded Latency)
     50.000%    1.19ms
     75.000%    1.59ms
     90.000%    1.93ms
     99.000%    2.62ms
     99.900%    3.32ms
     99.990%    8.47ms
     99.999%    9.67ms
    100.000%    9.67ms
    
  2. branch: master case 2.1: route configuration:

    {
        "uris": [
            "/zymk-getuserinfo-api/"
        ],
        "plugins": {
            "proxy-rewrite": {
                "headers": {
                    "X-Api-Engine": "apisix",
                    "X-Api-Version": "v1"
                },
                "regex_uri": [
                    "^/zymk-getuserinfo-api/(.)",
                    "/$1$2$3"
                ]
            }
        },
        "upstream": {
            "nodes": {
                "127.0.0.1:1980": 1
            },
            "type": "roundrobin"
        }
    }
    

    benchmark:

    wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
    Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      4 threads and 60 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.21ms  663.60us  15.90ms   76.45%
        Req/Sec        nan       nan   0.00      0.00%
      Latency Distribution (HdrHistogram - Recorded Latency)
     50.000%    1.17ms
     75.000%    1.56ms
     90.000%    1.92ms
     99.000%    2.65ms
     99.900%    7.88ms
     99.990%   15.33ms
     99.999%   15.90ms
    100.000%   15.90ms
    

    case 2.2: route configuration:

    {
        "uris": [
            "/zymk-getuserinfo-api/"
        ],
        "upstream": {
            "nodes": {
                "127.0.0.1:1980": 1
            },
            "type": "roundrobin"
        }
    }
    

    benchmark:

    wrk2 -t4 -c60 -d10s -R4000 --latency http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
    Running 10s test @ http://127.0.0.1:9080/index.html/zymk-getuserinfo-api/
      4 threads and 60 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency     1.15ms  544.15us   3.64ms   66.16%
        Req/Sec        nan       nan   0.00      0.00%
      Latency Distribution (HdrHistogram - Recorded Latency)
     50.000%    1.14ms
     75.000%    1.52ms
     90.000%    1.86ms
     99.000%    2.57ms
     99.900%    3.08ms
     99.990%    3.44ms
     99.999%    3.64ms
    100.000%    3.64ms
    

from case 1.1 and case 1.2, it is related to the regular configuration of the proxy-rewrite plugin from case 1.2 and case 2.2, it is related to apisix version

but this does not explain the 59 ms delay.

Things you need to be sure of.

  1. whether /zymk-getuserinfo-api/ is the only route that has this problem, and if so, then it may be related to the upstream of this route.

  2. remove the prometheus and proxy-rewrite plugins from the /zymk-getuserinfo-api/ route separately to determine if it is an APISIX 2.6 issue or a plugin issue.

possibly relate PR: https://github.com/apache/apisix/pull/4405/files

tzssangglass avatar Sep 23 '21 11:09 tzssangglass

BTW, do not use wrk to do benchmark, see why:https://www.apiseven.com/zh/blog/analysis-of-inaccurate-distribution-of-wrk-latency, and disable the prometheus plugin, just use wrk2 to do:https://github.com/giltene/wrk2

tzssangglass avatar Sep 24 '21 01:09 tzssangglass

use https://github.com/tsenart/vegeta

macaty avatar Sep 25 '21 06:09 macaty

@zhixiang0608 Have you ever tried to benchmark the upstream directly? It's unknown whether the long tail requests are due to upstream or APISIX itself. ther routing delays are normal, only this one,using this formula, I think it has nothing to do with service apisix latency = latency - upstream_reponse_time

zhixiang0608 avatar Sep 26 '21 01:09 zhixiang0608

can you use openresty-xray to test this route? It's really weird, and it's really weird to remove prometheus and use wrk2.

tzssangglass avatar Sep 26 '21 01:09 tzssangglass

image Receive response body delay from upstream 😂

zhixiang0608 avatar Sep 26 '21 02:09 zhixiang0608

image Receive response body delay from upstream 😂

zhixiang0608 avatar Sep 26 '21 02:09 zhixiang0608

According to this formula, the time consumption of upstream responders shall be removed 9f60e2cb2e97dac0cbf32e0a5e1953d

zhixiang0608 avatar Sep 26 '21 02:09 zhixiang0608

You are saying that some requests are skipped apisix_latency = apisix_latency - upstream_latency? Or maybe some requests don't have upstream_response_time

Let me think, what is the timeout set upstream?

I'm wondering if it's possible that the upstream response timeout is causing these requests to close.

tzssangglass avatar Sep 27 '21 14:09 tzssangglass

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the [email protected] list. Thank you for your contributions.

github-actions[bot] avatar Sep 13 '22 10:09 github-actions[bot]

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.

github-actions[bot] avatar Sep 27 '22 10:09 github-actions[bot]