tyk
tyk copied to clipboard
OOM after hot reload with JSVM enabled
Branch/Environment/Version
- Branch/Version: v4.0.3
- Environment: Running open source version with redis docker container with gateway binary.
- OS details
linux dominic-XPS-15-9550 5.0.0-32-generic #34~18.04.2-Ubuntu SMP Thu Oct 10 10:36:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
- Memory info;
MemTotal: 8024280 kB
SwapTotal: 10334200 kB
CPU info 8 coreIntel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
Describe the bug
When JSVM is enabled and I run a hot reload via http://localhost:8080/tyk/reload/group
this causes the used RAM by tyk to nearly double. If you then run again it increases even further, if you keep running it tyk eventually run out of memory and is killed. I've attached the API specs I've used. So they contains 2000 JSVM endpoints with basic JS of;
function echoHeaderVirtualHandler(request, session, config) {
log("log() - inside echoHeaderVirtualHandler")
log("Request header type:" + typeof JSON.stringify(request.Headers))
log("Request header:" + JSON.stringify(request.Headers))
var responseObject = {
Headers: {
"Content-Type": "application/json",
},
Body: "" + JSON.stringify(request) + "\n",
Code: 200
}
log("log() - exiting echoHeaderVirtualHandler")
return TykJsResponse(responseObject, session.meta_data)
}
Along with the defailt APIs that come with Tyk. I think this issue also exists on older versions as have noticed it on v2.9.4.7 but memory doesn't increase as much but I think that may be due to tyk maturing over the years and gaining new features so each API spec takes up more memory, this issue was raised some time ago https://github.com/TykTechnologies/tyk/issues/496
The problem lies here when we re-build each API spec https://github.com/TykTechnologies/tyk/blob/v4.0.3/gateway/api_definition.go#L534
paths, _ := filepath.Glob(filepath.Join(dir, "*.json"))
for _, path := range paths {
log.Info("Loading API Specification from ", path)
f, err := os.Open(path)
if err != nil {
log.Error("Couldn't open api configuration file: ", err)
continue
}
def := a.ParseDefinition(f)
f.Close()
spec := a.MakeSpec(&def, nil)
specs = append(specs, spec)
}
Using delve
once the loop is complete the memory nearly doubles. The problem is if only 1 API is updated, it will rebuild all APIs with a.MakeSpec(&def, nil)
so we then have double the amount of APIs in memory. Once the reload is complete and after a long time the GC eventually cleans up the old APIs. I can't find any reference to the old APIs within the code... so at a guess the GC does not clean them up straight away but does it in a phased approach as there's so much memory to clean up so would mean a big degredation in performance if it did it all straight away.. maybe, just a theory.
I've created a fix for the issue, I've only tested it with loading APIs from directory so will need testing via dashboard and RPC. So essentially what it does, it creates a SHA256 unique hash of each API definitoin, once the reload is triggered, it will compute the hash of the incoming API, if it hasn't changed it will use the API spec in-memory, if it's changed it'll call a.MakeSpec(&def, nil)
. The memory now stays stable upon API reload and is obviously a lot quicker on the reload as well.
How do I go about submitting my PR as I don't have permission to push up my branch to the tyk repo?
Reproduction steps Steps to reproduce the behavior:
- Export APIs in attached zip to ./apps directory
- Configure Tyk to use the tyk.conf attached
- Once APIs are loaded
- Confirm one API works with
curl http://localhost:8080/jsvm-testing-some-api-id-22
- Response;
{"Body":"","Headers":{"Accept":["*/*"],"User-Agent":["curl/7.58.0"]},"Params":{},"Scheme":"http","URL":"/jsvm-testing-some-api-id-1"}
- Trigger reload with
curl -H "x-tyk-authorization: 352d20ee67be67f6340b4c0605b044b7" -s http://localhost:8080/tyk/reload/group
- Monitor memory of tyk via
top
and notice the big increase - Once reload is complete memory remains at this high value
- If you repeat the reload 3 or 4 more times I eventually got OOM and process was killed
tyk.conf
{
"listen_port": 8080,
"secret": "352d20ee67be67f6340b4c0605b044b7",
"template_path": "./templates",
"middleware_path": "./middleware",
"use_db_app_configs": false,
"app_path": "./apps/",
"log_level" : "debug",
"storage": {
"type": "redis",
"host": "localhost",
"port": 6379,
"username": "",
"password": "",
"database": 0,
"optimisation_max_idle": 2000,
"optimisation_max_active": 4000
},
"enable_analytics": false,
"analytics_config": {
"type": "csv",
"csv_dir": "/tmp",
"mongo_url": "",
"mongo_db_name": "",
"mongo_collection": "",
"purge_delay": -1,
"ignored_ips": []
},
"health_check": {
"enable_health_checks": true,
"health_check_value_timeouts": 60
},
"optimisations_use_async_session_write": false,
"enable_non_transactional_rate_limiter": true,
"enable_sentinel_rate_limiter": false,
"enable_redis_rolling_limiter": false,
"allow_master_keys": false,
"policies": {
"policy_source": "file",
"policy_record_name": "./policies/policies.json"
},
"hash_keys": true,
"close_connections": false,
"http_server_options": {
"enable_websockets": true
},
"allow_insecure_configs": true,
"coprocess_options": {
"enable_coprocess": true,
"coprocess_grpc_server": ""
},
"enable_bundle_downloader": true,
"bundle_base_url": "",
"global_session_lifetime": 100,
"force_global_session_lifetime": false,
"max_idle_connections_per_host": 500,
"enable_jsvm": true,
"enable_http_profiler" : true
}
[apps.zip](https://github.com/TykTechnologies/tyk/files/9240378/apps.zip)
Hello!
Ace research!
How do I go about submitting my PR as I don't have permission to push up my branch to the tyk repo? Create a fork and submit PR from a fork?
I have a feeling that your solution mask issue, but not full resolve it (but this defo smth we though about in context of speeding up API reloads). You will probably continue seeing similar issue, when you do a bunch of API updates (e.g. on each real API update, for that single API, it going to increase memory a bit).
Proper issue for this exact problem would be ensure that API cleans up its resources, when it reloaded. In this case will stop JSVM engine. Or another way, JSVM engines can be put some some specific Pool of objects, which all APIs will re-use.
Would love to see your PR, as a starting point!
Yeah I agree the APIs should be cleaned up upon reload... but from what I can see there is no longer a reference to the old APIs once gw.apisHandlesByID
and gw.apisByID
are replaced.
Good suggestion on the shared pool of object JSVM could use... I've noticed if JSVM has no JS the VM still takes up around 2MB of memory due to the fact it imports the base JS from https://github.com/TykTechnologies/tyk/blob/v4.0.3/gateway/mw_js_plugin.go#L639 but also the unscore JS from https://github.com/robertkrimen/otto/tree/master/underscore which is obviously repeated across all JSVM APIs.
Ok cool, will create a fork and submit the PR.
Created a PR https://github.com/TykTechnologies/tyk/pull/4215
I've added to PR to remove the API defintion from context once go plugin has finished with request so GC can pick it up upon reload, and to also kill VMs on reload
I don't think this is an issue for the coder to clean up resources.. as there's no more reference to the previous APIs after the reload.
Immediately after the reload if you look at what the CPU is doing via pprof;
If you look at the screenshot attached 93.47% of CPU is consumed by GC marking objects on the heap for deleting. After the CPU returns to a normal state the garbage collection completes and the memory returns to normal. This takes a minute or 2 as we have unecessarily reloaded so many API specs.
So I don't think this PR is masking the issue as the GC will remove the old definition, so memory won't build up with updates. What do you think?
I think JSVM instance itself leaves some active memory footprint untill you explictly stop it. E.g. maybe before loading new set of APIs inside apiReload, we can travers currently APIs, and explictly stop their JSVM VMs?
I've stopped it here where it's releasing other resources https://github.com/TykTechnologies/tyk/pull/4215/files#diff-0cf80174bbafb36f6d4f4308ebbd971b2833b76a936bad568220aa1a4ba0ee8bR220-R226 that the right place to do it?
Ah, yes you are right.
So basically we can say that this PR address the issue via 2 distinct ways.
- Cleaning up JSVM vm memory
- Doing above ^ (and full API reload), only when it actually changed, which makes it even faster and less noticable
Sounds good to me
Hi @buger have you had a chance to look at the PR yet?
@domsolutions Can you try versions 4.0.7 or 4.2.2? As this seemed to have been fixed in those versions
Closing as fixed in Tyk 4.0.7/4.2.2.
Please do not hesitate to re-open if you still have this problem.
Thank you for supporting Tyk!