[THREESCALE-9537] Configure batcher policy storage
What
Fixes: https://issues.redhat.com/browse/THREESCALE-9537
Dev notes
1. What shared dict value should we increase?
3scale batcher policy use a few different shared dict caches
lua_shared_dict cached_auths 20m;
lua_shared_dict batched_reports 20m;
lua_shared_dict batched_reports_locks 1m;
and api_keys if Caching policy included in the chain
lua_shared_dict api_keys 30m;
First let's run some test to see how much 1m of shared cache can hold.
lua_shared_dict test 1m;
location = /t {
content_by_lua_block{
local rep = string.rep
local dt = ngx.shared.test
local val = rep("v", 15)
local key = rep("k", 32)
local i = 0
while i < 200000 do
local ok, err = dt:safe_add("service_id:_" .. i ..",user_key:".. key ..",metric:hits", i)
if not ok then
break
end
i = i + 1
end
ngx.say(i, " key/value pairs inserted")
}
}
The reason to use safe_set() here is to prevent automatic evicting least recently used items upon memory shortage in set().
Querying /t gives the response body on my Linux x86_64 system. NOTE: you may get different value as this actually depends on the underlying architecture.
4033 key/value pairs inserted.
So a 1m store can hold 4033 key/value pairs with 57-byte keys and 15-byte values. In reality, the actual available space will depend on the memory fragment but since these key/value pairs are consistent in size, we should have no problem. More details here
Changing the "dict" store to 10m gives
40673 key/value pairs inserted.
So a 10m store can hold 40673 pairs. It's a linear growth as expected.
| Name | Method | When full | TTL | Growth |
|---|---|---|---|---|
| cached_auths | set/get | evict old/expired key | 10 | 1 for each new transaction |
| batched_reports | safe_add/incr | return error | 1 for each transaction that return 200. If the key exist then update the existing key. Flush all keys after batch report seconds timeout (default 10s) | |
| batched_reports_locks | set/delete | evict old/expired key | None. | 1 for each transaction. But lock is release in the same function |
| api_keys | get/set/delete | evict old/expired key | None |
So we can see that all will grow the equally, but due to the use of safe_add and cache reports for 10 seconds, onlybatched_reports will return no memory error.
A possible workaround is to set batch_report_seconds to lower value.
2. Do I need to increase the batcher policy storage.
Let do a small test and increase the key size:
| Key | Value | Key format | credential lengths | Key/values pairs | |
|---|---|---|---|---|---|
| batched_reports | 20m | service_id:<service_id>,user_key:<service_crendential>,metric:<metric_name> | 60 | 81409 | |
| 120 | 81409 | ||||
| 142 | 40705 | ||||
| 400 | 20400 |
with a key that >400bytes, for the batched_reports to be fully filled, it would require 20400/10 = 2040 req/sec. It's very unlikely that a single gateway will be hit with this much traffic.
@eguzki do you know what is the highest load a single gateway can handle?
Verification Steps
Filling the storage is a bit tricky so I just check to see if the configuration file is filled with the correct value.
- Create a apicast-config.json file with the following content
cat <<EOF >apicast-config.json
{
"services": [
{
"id": "1",
"backend_version": "1",
"proxy": {
"hosts": [
"one"
],
"api_backend": "https://echo-api.3scale.net:443",
"backend": {
"endpoint": "http://127.0.0.1:8081",
"host": "backend"
},
"policy_chain": [
{
"name": "apicast.policy.apicast"
}
],
"proxy_rules": [
{
"http_method": "GET",
"pattern": "/",
"metric_system_name": "hits",
"delta": 1,
"parameters": [],
"querystring_parameters": {}
}
]
}
}
]
}
EOF
- Checkout this branch and start dev environment
make development
make dependencies
- Run apicast locally with
APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZEset to 40m
THREESCALE_DEPLOYMENT_ENV=staging APICAST_LOG_LEVEL=debug APICAST_WORKER=1 APICAST_CONFIGURATION_LOADER=lazy APICAST_CONFIGURATION_CACHE=0 APICAST_POLICY_BATCHER_SHARED_MEMORY_SIZE="40m" THREESCALE_CONFIG_FILE=apicast-config.json ./bin/apicast
- Stop the gateway
CTRL-C
- Check that
lua_shared_dict batched_reportsis set to 40m
$ grep -nr "batched_reports" /tmp
/tmp/lua_PRpxLW:67:lua_shared_dict batched_reports 40m;
/tmp/lua_PRpxLW:68:lua_shared_dict batched_reports_locks 1m;
Also fix the test steps.
Question: The shared dict is shared for all the 3scale products? So if I have 20m, those are shared for me and other 3scale users?
Yes the shared dict is shared between workers and all 3scal products and perhaps user also.
Maybe I would add some documentation with your tests, saying what you can get for the default values of the policy and the new env var for several key sizes. Then the same for half/double of the default value of
batch_report_seconds. Same thing for haf/double for the default value of the new env var.
Where do you think that doc would live? inside the top level doc or inside the policy?
he top level doc or inside the policy
I would say in the specific readme for the batcher policy: https://github.com/3scale/APIcast/blob/master/gateway/src/apicast/policy/3scale_batcher/README.md
Thanks @dfennessy. I will need your approval also