APIcast icon indicating copy to clipboard operation
APIcast copied to clipboard

[THREESCALE-9537] Configure batcher policy storage

Open tkan145 opened this issue 1 year ago • 2 comments

What

Fixes: https://issues.redhat.com/browse/THREESCALE-9537

Dev notes

1. What shared dict value should we increase?

3scale batcher policy use a few different shared dict caches

lua_shared_dict cached_auths 20m;
lua_shared_dict batched_reports 20m;
lua_shared_dict batched_reports_locks 1m;

and api_keys if Caching policy included in the chain

lua_shared_dict api_keys 30m;

First let's run some test to see how much 1m of shared cache can hold.

lua_shared_dict test 1m;
location = /t {
	content_by_lua_block{
		local rep = string.rep
		local dt = ngx.shared.test
		local val = rep("v", 15)
                local key = rep("k", 32)
		local i = 0
		while i < 200000 do
                    local ok, err = dt:safe_add("service_id:_" .. i ..",user_key:".. key ..",metric:hits", i)
                    if not ok then
		        break
		    end
		    i = i + 1
		end
		ngx.say(i, " key/value pairs inserted")
    }
}

The reason to use safe_set() here is to prevent automatic evicting least recently used items upon memory shortage in set().

Querying /t gives the response body on my Linux x86_64 system. NOTE: you may get different value as this actually depends on the underlying architecture.

4033 key/value pairs inserted.

So a 1m store can hold 4033 key/value pairs with 57-byte keys and 15-byte values. In reality, the actual available space will depend on the memory fragment but since these key/value pairs are consistent in size, we should have no problem. More details here

Changing the "dict" store to 10m gives

40673 key/value pairs inserted.

So a 10m store can hold 40673 pairs. It's a linear growth as expected.

Name Method When full TTL Growth
cached_auths set/get evict old/expired key 10 1 for each new transaction
batched_reports safe_add/incr return error 1 for each transaction that return 200. If the key exist then update the existing key. Flush all keys after batch report seconds timeout (default 10s)
batched_reports_locks set/delete evict old/expired key None. 1 for each transaction. But lock is release in the same function
api_keys get/set/delete evict old/expired key None

So we can see that all will grow the equally, but due to the use of safe_add and cache reports for 10 seconds, onlybatched_reports will return no memory error.

A possible workaround is to set batch_report_seconds to lower value.

2. Do I need to increase the batcher policy storage.

Let do a small test and increase the key size:

Key Value Key format credential lengths Key/values pairs
batched_reports 20m service_id:<service_id>,user_key:<service_crendential>,metric:<metric_name> 60 81409
120 81409
142 40705
400 20400

with a key that >400bytes, for the batched_reports to be fully filled, it would require 20400/10 = 2040 req/sec. It's very unlikely that a single gateway will be hit with this much traffic.

@eguzki do you know what is the highest load a single gateway can handle?

Verification Steps

Filling the storage is a bit tricky so I just check to see if the configuration file is filled with the correct value.

  • Create a apicast-config.json file with the following content
cat <<EOF >apicast-config.json
{
    "services": [
        {
            "id": "1",
            "backend_version": "1",
            "proxy": {
                "hosts": [
                    "one"
                ],
                "api_backend": "https://echo-api.3scale.net:443",
                "backend": {
                    "endpoint": "http://127.0.0.1:8081",
                    "host": "backend"
                },
                "policy_chain": [
                    {
                        "name": "apicast.policy.apicast"
                    }
                ],
                "proxy_rules": [
                    {
                        "http_method": "GET",
                        "pattern": "/",
                        "metric_system_name": "hits",
                        "delta": 1,
                        "parameters": [],
                        "querystring_parameters": {}
                    }
                ]
            }
        }
    ]
} 
EOF
  • Checkout this branch and start dev environment
make development
make dependencies
  • Run apicast locally with APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZE set to 40m
THREESCALE_DEPLOYMENT_ENV=staging APICAST_LOG_LEVEL=debug APICAST_WORKER=1 APICAST_CONFIGURATION_LOADER=lazy APICAST_CONFIGURATION_CACHE=0  APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZE="40m" THREESCALE_CONFIG_FILE=apicast-config.json ./bin/apicast
  • Stop the gateway
CTRL-C
  • Check that lua_shared_dict batched_reports is set to 40m
$ grep -nr "batched_reports" /tmp

/tmp/lua_PRpxLW:67:lua_shared_dict batched_reports 40m;
/tmp/lua_PRpxLW:68:lua_shared_dict batched_reports_locks 1m;

tkan145 avatar Mar 08 '24 05:03 tkan145

Also fix the test steps.

Question: The shared dict is shared for all the 3scale products? So if I have 20m, those are shared for me and other 3scale users?

Yes the shared dict is shared between workers and all 3scal products and perhaps user also.

Maybe I would add some documentation with your tests, saying what you can get for the default values of the policy and the new env var for several key sizes. Then the same for half/double of the default value of batch_report_seconds. Same thing for haf/double for the default value of the new env var.

Where do you think that doc would live? inside the top level doc or inside the policy?

tkan145 avatar Mar 22 '24 07:03 tkan145

he top level doc or inside the policy

I would say in the specific readme for the batcher policy: https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/policy/3scale_batcher/README.md

eguzki avatar Mar 22 '24 08:03 eguzki

Thanks @dfennessy. I will need your approval also

tkan145 avatar Apr 03 '24 00:04 tkan145