boundary
boundary copied to clipboard
CPU burn / slow >= v0.14.0
Describe the bug
Running Hashicorp Boundary in a container (at least) burns CPU and each API request from UI is slow (~1s).
To Reproduce
Steps to reproduce the behavior in dev mode:
- Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
- Install Docker
- Fix permissions for the test (horrible fix, but it's just to reproduce) : sudo chmod 777 /var/run/docker.sock
- Run boundary dev mode :
docker run --net=host -v /var/run/docker.sock:/var/run/docker.sock --rm hashicorp/boundary:latest dev
- Boundary starts and then CPU burns
Steps to reproduce the behavior in dev mode with external Postgres 15 (another machine):
- Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
- Install Docker
- Run boundary dev mode :
docker run --rm hashicorp/boundary:latest dev -database-url=XXXXX
- Boundary starts (but is really slow, it's about few minutes) and then CPU burns
Steps to reproduce the behavior in production mode with external Postgres 15 (another machine):
- Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
- Install Docker
- Run boundary mode with production config enabling worker
- Boundary starts (but is really slow) and then CPU burn. Each API request from UI is about 1s
However, in production mode, I found that disabling worker, stops CPU burning but API requests are still really slow from UI.
If I run in dev mode on my laptop (Apple M1 PRO) without Docker, I don't have CPU burn issue and everything is fast. If I run on VMs without container, I also have the issue (Linux Ubuntu 22.04 LTS related issue ?)
It has been tested on two different cloud providers for VMs, both running Ubuntu 22.04 LTS
I don't have the CPU burn issue with v0.13.1 but I've the issue with v0.14.0 and v0.14.1. However, I've really slow API requests and so slow UI with v0.13.1.
Expected behavior
Hashicorp boundary is responsive and no longer burn CPU.
Additional context Add any other context about the problem here.
Looked into the issue and it seems to be caused by a long-running background check which was using a lot of CPU.
We have started working on a fix for this issue: https://github.com/hashicorp/boundary/pull/3884
@guyguy333 The latest Boundary 0.14.2 release has the the fix to address this issue. Let me know if that helps address the issue.
Thanks a lot @elimt, I can confirm it solved CPU burning issue.
However, I still have huge slowness in production mode with latency-ms ~900ms for most requests.
{"id":"7JE1s4IfXH","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.165161,"request_info":{"id":"gtraceid_qEvPzHFlfTv5afaUN18p","method":"GET","path":"/assets/chunk.143.00d6b02cc76cee2b78af.css","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.173461747Z","status":200,"stop":"2023-11-03T10:25:05.173626908Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.17368055Z"} {"id":"oBfB2hT88d","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":87.068323,"request_info":{"id":"gtraceid_m9aY7XMw4e7BQznWx0Rh","method":"GET","path":"/assets/admin-af689c1f154f54624ca33cae48e25b28.js","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.172516341Z","status":200,"stop":"2023-11-03T10:25:05.259584645Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.259649468Z"} {"id":"Ov4VjUvk8a","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":775.192092,"request_info":{"id":"gtraceid_cz5gffR4UiJdnGvNu1RN","method":"POST","path":"/v1/auth-methods/amoidc_e3nnv6V9iW:authenticate","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:04.649536737Z","status":200,"stop":"2023-11-03T10:25:05.424728829Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.424770388Z"} {"id":"UlC7nw8Jqc","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.082395,"request_info":{"id":"gtraceid_u45HSFxDh04IwApjfBXB","method":"GET","path":"/metadata.json","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.38484681Z","status":200,"stop":"2023-11-03T10:25:06.384929205Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.384938022Z"} {"id":"6fwtp8bCs8","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":972.740802,"request_info":{"id":"gtraceid_ykSoQDOJxeyDsYUV1yGq","method":"GET","path":"/v1/scopes","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.629973416Z","status":200,"stop":"2023-11-03T10:25:06.602714228Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.602746249Z"} {"id":"gW9n1x0HW4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":989.508232,"request_info":{"id":"gtraceid_v7EVJ5z8C832jsvq1iBF","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.638015536Z","status":200,"stop":"2023-11-03T10:25:06.627523798Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.627539117Z"} {"id":"0nAfUIaAlM","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":991.043242,"request_info":{"id":"gtraceid_ZeNqL68w8Ep6k89PbiiS","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.637693848Z","status":200,"stop":"2023-11-03T10:25:06.62873713Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.62877961Z"} {"id":"dmo3UDeyhx","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":1034.120546,"request_info":{"id":"gtraceid_vSNloLY7eOOyGeMICnkM","method":"GET","path":"/v1/auth-tokens/at_qeTkERl3Kn","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.400616576Z","status":200,"stop":"2023-11-03T10:25:07.434737122Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.434803057Z"} {"id":"wxJ0JH4n36","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":921.721597,"request_info":{"id":"gtraceid_FTcYuCbTBy9plwjom4hk","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.709372454Z","status":200,"stop":"2023-11-03T10:25:07.631094051Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.631149856Z"} {"id":"TxhfbhHCK4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":817.078066,"request_info":{"id":"gtraceid_GvSTQVGSLlW9dzh1Zd1y","method":"GET","path":"/v1/scopes/o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.130487691Z","status":200,"stop":"2023-11-03T10:25:11.947565747Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.947604821Z"} {"id":"S0K7cZqppj","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":829.705026,"request_info":{"id":"gtraceid_3Lmc0JvTAItSFaRuTUG5","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.121902266Z","status":200,"stop":"2023-11-03T10:25:11.951607302Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.951625405Z"} {"id":"6UDn6LHuZp","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":794.742167,"request_info":{"id":"gtraceid_OdsIzgdP7IqEUAI98Lzc","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027060117Z","status":200,"stop":"2023-11-03T10:25:12.821802244Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.821868149Z"} {"id":"zVfv7ZhDK0","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":837.209621,"request_info":{"id":"gtraceid_OovTBM7tqQyYfLWBOuoN","method":"GET","path":"/v1/scopes?scope_id=o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027403876Z","status":403,"stop":"2023-11-03T10:25:12.864613497Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.864644635Z"} {"id":"3I1Z01FrTl","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":917.06497,"request_info":{"id":"gtraceid_O8XAuCtUbexM1ZrtKIcv","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604024804Z","status":200,"stop":"2023-11-03T10:25:15.521089724Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.521150309Z"} {"id":"2R8E3gK70f","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":968.21363,"request_info":{"id":"gtraceid_IzyR4Go2cX603TWf9Q4X","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604082573Z","status":200,"stop":"2023-11-03T10:25:15.572296163Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.572363149Z"} {"id":"q0EfrkM7rA","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":998.116061,"request_info":{"id":"gtraceid_9Qo8NLOuhNR3chIFLRjh","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:15.575405335Z","status":200,"stop":"2023-11-03T10:25:16.573521366Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:16.573599573Z"} {"id":"Eaj6I2Jil7","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.576419,"request_info":{"id":"gtraceid_JWkXTK7wGREHnh3PJu4A","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438582075Z","status":200,"stop":"2023-11-03T10:25:31.439158494Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439219119Z"} {"id":"7zSXs64jqr","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.97378,"request_info":{"id":"gtraceid_FipMaGN8ciNNg7zdyWMi","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438571835Z","status":200,"stop":"2023-11-03T10:25:31.439545605Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439597793Z"}
@guyguy333
- Could you please provide details about your setup?
- Could you also share your system specs? Boundary has a page about system requirements
Hey @guyguy333 -- are you still encountering this issue?
Hi @AdamBouhmad, yes I still have the issue. App is slow and request latency is about 900ms, resulting is really slow UI.
To answer @elimt and provide more details, I run container on a K8S cluster (hashicorp/boundary:0.14.3). There is no CPU or memory limit for this container.
I setup these env vars: BOUNDARY_POSTGRES_URL, HOSTNAME (= boundary) and VAULT_TOKEN.
Server is started using boundary server -config /boundary/config.hcl
My config:
disable_mlock = true
log_format = "json"
controller {
name = "kubernetes-controller"
description = "Boundary Controller"
public_cluster_addr = "boundary-cluster.example.com:443"
database {
url = "env://BOUNDARY_POSTGRES_URL"
max_open_connections = 10
max_idle_connections = 10
}
}
# Ingress TCP Route
worker {
name = "kubernetes-worker"
description = "Boundary Worker"
address = "localhost"
initial_upstreams = ["boundary:9201"]
public_addr = "boundary-worker.example.com:443"
}
# Ingress
listener "tcp" {
address = "0.0.0.0"
purpose = "api"
tls_disable = true
cors_enabled = true
cors_allowed_origins = ["https://boundary.example.com"]
}
listener "tcp" {
address = "0.0.0.0"
purpose = "cluster"
tls_cert_file = "/certs/tls.crt"
tls_key_file = "/certs/tls.key"
}
listener "tcp" {
address = "0.0.0.0"
purpose = "proxy"
tls_cert_file = "/certs/tls.crt"
tls_key_file = "/certs/tls.key"
}
kms "transit" {
purpose = "root"
address = "https://vault.example.com"
// Key configuration
key_name = "boundary-root"
mount_path = "transit/"
}
kms "transit" {
purpose = "recovery"
address = "https://vault.example.com"
// Key configuration
key_name = "boundary-recovery"
mount_path = "transit/"
}
kms "transit" {
purpose = "worker-auth"
address = "https://vault.example.com"
// Key configuration
key_name = "boundary-worker-auth"
mount_path = "transit/"
}
Certs are mounted from a K8S secret in /certs.
Host machine uses AMD64 arch and has 4 cores and 32Gb RAM
Currently, there is no load, only me testing solution to have something stable before opening it.
Thanks for considering the issue :)