APIcast [THREESCALE-9537] Configure batcher policy storage

What

Fixes: https://issues.redhat.com/browse/THREESCALE-9537

Dev notes

1. What shared dict value should we increase?

3scale batcher policy use a few different shared dict caches

lua_shared_dict cached_auths 20m;
lua_shared_dict batched_reports 20m;
lua_shared_dict batched_reports_locks 1m;

and api_keys if Caching policy included in the chain

lua_shared_dict api_keys 30m;

First let's run some test to see how much 1m of shared cache can hold.

lua_shared_dict test 1m;
location = /t {
	content_by_lua_block{
		local rep = string.rep
		local dt = ngx.shared.test
		local val = rep("v", 15)
                local key = rep("k", 32)
		local i = 0
		while i < 200000 do
                    local ok, err = dt:safe_add("service_id:_" .. i ..",user_key:".. key ..",metric:hits", i)
                    if not ok then
		        break
		    end
		    i = i + 1
		end
		ngx.say(i, " key/value pairs inserted")
    }
}

The reason to use safe_set() here is to prevent automatic evicting least recently used items upon memory shortage in set().

Querying /t gives the response body on my Linux x86_64 system. NOTE: you may get different value as this actually depends on the underlying architecture.

4033 key/value pairs inserted.

So a 1m store can hold 4033 key/value pairs with 57-byte keys and 15-byte values. In reality, the actual available space will depend on the memory fragment but since these key/value pairs are consistent in size, we should have no problem. More details here

Changing the "dict" store to 10m gives

40673 key/value pairs inserted.

So a 10m store can hold 40673 pairs. It's a linear growth as expected.

Name	Method	When full	TTL	Growth
cached_auths	set/get	evict old/expired key	10	1 for each new transaction
batched_reports	safe_add/incr	return error		1 for each transaction that return 200. If the key exist then update the existing key. Flush all keys after batch report seconds timeout (default 10s)
batched_reports_locks	set/delete	evict old/expired key	None.	1 for each transaction. But lock is release in the same function
api_keys	get/set/delete	evict old/expired key	None

So we can see that all will grow the equally, but due to the use of safe_add and cache reports for 10 seconds, onlybatched_reports will return no memory error.

A possible workaround is to set batch_report_seconds to lower value.

2. Do I need to increase the batcher policy storage.

Let do a small test and increase the key size:

Key	Value	Key format	credential lengths	Key/values pairs
batched_reports	20m	service_id:<service_id>,user_key:<service_crendential>,metric:<metric_name>	60	81409
			120	81409
			142	40705
			400	20400

with a key that >400bytes, for the batched_reports to be fully filled, it would require 20400/10 = 2040 req/sec. It's very unlikely that a single gateway will be hit with this much traffic.

@eguzki do you know what is the highest load a single gateway can handle?

Verification Steps

Filling the storage is a bit tricky so I just check to see if the configuration file is filled with the correct value.

Create a apicast-config.json file with the following content

cat <<EOF >apicast-config.json
{
    "services": [
        {
            "id": "1",
            "backend_version": "1",
            "proxy": {
                "hosts": [
                    "one"
                ],
                "api_backend": "https://echo-api.3scale.net:443",
                "backend": {
                    "endpoint": "http://127.0.0.1:8081",
                    "host": "backend"
                },
                "policy_chain": [
                    {
                        "name": "apicast.policy.apicast"
                    }
                ],
                "proxy_rules": [
                    {
                        "http_method": "GET",
                        "pattern": "/",
                        "metric_system_name": "hits",
                        "delta": 1,
                        "parameters": [],
                        "querystring_parameters": {}
                    }
                ]
            }
        }
    ]
} 
EOF

Checkout this branch and start dev environment

make development
make dependencies

Run apicast locally with APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZE set to 40m

THREESCALE_DEPLOYMENT_ENV=staging APICAST_LOG_LEVEL=debug APICAST_WORKER=1 APICAST_CONFIGURATION_LOADER=lazy APICAST_CONFIGURATION_CACHE=0  APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZE="40m" THREESCALE_CONFIG_FILE=apicast-config.json ./bin/apicast

Stop the gateway

CTRL-C

Check that lua_shared_dict batched_reports is set to 40m

$ grep -nr "batched_reports" /tmp

/tmp/lua_PRpxLW:67:lua_shared_dict batched_reports 40m;
/tmp/lua_PRpxLW:68:lua_shared_dict batched_reports_locks 1m;

Mar 08 '24 05:03 tkan145

Also fix the test steps.

Question: The shared dict is shared for all the 3scale products? So if I have 20m, those are shared for me and other 3scale users?

Yes the shared dict is shared between workers and all 3scal products and perhaps user also.

Maybe I would add some documentation with your tests, saying what you can get for the default values of the policy and the new env var for several key sizes. Then the same for half/double of the default value of batch_report_seconds. Same thing for haf/double for the default value of the new env var.

Where do you think that doc would live? inside the top level doc or inside the policy?

Mar 22 '24 07:03 tkan145

he top level doc or inside the policy

I would say in the specific readme for the batcher policy: https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/policy/3scale_batcher/README.md

Mar 22 '24 08:03 eguzki

Thanks @dfennessy. I will need your approval also

Apr 03 '24 00:04 tkan145