boundary icon indicating copy to clipboard operation
boundary copied to clipboard

CPU burn / slow >= v0.14.0

Open guyguy333 opened this issue 1 year ago • 6 comments

Describe the bug

Running Hashicorp Boundary in a container (at least) burns CPU and each API request from UI is slow (~1s).

Screenshot 2023-10-22 at 11 42 22

To Reproduce

Steps to reproduce the behavior in dev mode:

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Fix permissions for the test (horrible fix, but it's just to reproduce) : sudo chmod 777 /var/run/docker.sock
  4. Run boundary dev mode : docker run --net=host -v /var/run/docker.sock:/var/run/docker.sock --rm hashicorp/boundary:latest dev
  5. Boundary starts and then CPU burns

Steps to reproduce the behavior in dev mode with external Postgres 15 (another machine):

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Run boundary dev mode : docker run --rm hashicorp/boundary:latest dev -database-url=XXXXX
  4. Boundary starts (but is really slow, it's about few minutes) and then CPU burns

Steps to reproduce the behavior in production mode with external Postgres 15 (another machine):

  1. Order a fresh Linux VM with Ubuntu 22.04 LTS (AMD EPYC 4 cores, 16Gb RAM)
  2. Install Docker
  3. Run boundary mode with production config enabling worker
  4. Boundary starts (but is really slow) and then CPU burn. Each API request from UI is about 1s

However, in production mode, I found that disabling worker, stops CPU burning but API requests are still really slow from UI.

If I run in dev mode on my laptop (Apple M1 PRO) without Docker, I don't have CPU burn issue and everything is fast. If I run on VMs without container, I also have the issue (Linux Ubuntu 22.04 LTS related issue ?)

It has been tested on two different cloud providers for VMs, both running Ubuntu 22.04 LTS

I don't have the CPU burn issue with v0.13.1 but I've the issue with v0.14.0 and v0.14.1. However, I've really slow API requests and so slow UI with v0.13.1.

Expected behavior

Hashicorp boundary is responsive and no longer burn CPU.

Additional context Add any other context about the problem here.

guyguy333 avatar Oct 22 '23 09:10 guyguy333

Looked into the issue and it seems to be caused by a long-running background check which was using a lot of CPU.

We have started working on a fix for this issue: https://github.com/hashicorp/boundary/pull/3884

elimt avatar Oct 23 '23 21:10 elimt

@guyguy333 The latest Boundary 0.14.2 release has the the fix to address this issue. Let me know if that helps address the issue.

elimt avatar Nov 02 '23 17:11 elimt

Thanks a lot @elimt, I can confirm it solved CPU burning issue.

However, I still have huge slowness in production mode with latency-ms ~900ms for most requests.

{"id":"7JE1s4IfXH","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.165161,"request_info":{"id":"gtraceid_qEvPzHFlfTv5afaUN18p","method":"GET","path":"/assets/chunk.143.00d6b02cc76cee2b78af.css","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.173461747Z","status":200,"stop":"2023-11-03T10:25:05.173626908Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.17368055Z"} {"id":"oBfB2hT88d","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":87.068323,"request_info":{"id":"gtraceid_m9aY7XMw4e7BQznWx0Rh","method":"GET","path":"/assets/admin-af689c1f154f54624ca33cae48e25b28.js","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.172516341Z","status":200,"stop":"2023-11-03T10:25:05.259584645Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.259649468Z"} {"id":"Ov4VjUvk8a","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":775.192092,"request_info":{"id":"gtraceid_cz5gffR4UiJdnGvNu1RN","method":"POST","path":"/v1/auth-methods/amoidc_e3nnv6V9iW:authenticate","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:04.649536737Z","status":200,"stop":"2023-11-03T10:25:05.424728829Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:05.424770388Z"} {"id":"UlC7nw8Jqc","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.082395,"request_info":{"id":"gtraceid_u45HSFxDh04IwApjfBXB","method":"GET","path":"/metadata.json","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.38484681Z","status":200,"stop":"2023-11-03T10:25:06.384929205Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.384938022Z"} {"id":"6fwtp8bCs8","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":972.740802,"request_info":{"id":"gtraceid_ykSoQDOJxeyDsYUV1yGq","method":"GET","path":"/v1/scopes","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.629973416Z","status":200,"stop":"2023-11-03T10:25:06.602714228Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.602746249Z"} {"id":"gW9n1x0HW4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":989.508232,"request_info":{"id":"gtraceid_v7EVJ5z8C832jsvq1iBF","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.638015536Z","status":200,"stop":"2023-11-03T10:25:06.627523798Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.627539117Z"} {"id":"0nAfUIaAlM","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":991.043242,"request_info":{"id":"gtraceid_ZeNqL68w8Ep6k89PbiiS","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:05.637693848Z","status":200,"stop":"2023-11-03T10:25:06.62873713Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:06.62877961Z"} {"id":"dmo3UDeyhx","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":1034.120546,"request_info":{"id":"gtraceid_vSNloLY7eOOyGeMICnkM","method":"GET","path":"/v1/auth-tokens/at_qeTkERl3Kn","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.400616576Z","status":200,"stop":"2023-11-03T10:25:07.434737122Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.434803057Z"} {"id":"wxJ0JH4n36","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":921.721597,"request_info":{"id":"gtraceid_FTcYuCbTBy9plwjom4hk","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:06.709372454Z","status":200,"stop":"2023-11-03T10:25:07.631094051Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:07.631149856Z"} {"id":"TxhfbhHCK4","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":817.078066,"request_info":{"id":"gtraceid_GvSTQVGSLlW9dzh1Zd1y","method":"GET","path":"/v1/scopes/o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.130487691Z","status":200,"stop":"2023-11-03T10:25:11.947565747Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.947604821Z"} {"id":"S0K7cZqppj","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":829.705026,"request_info":{"id":"gtraceid_3Lmc0JvTAItSFaRuTUG5","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:11.121902266Z","status":200,"stop":"2023-11-03T10:25:11.951607302Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:11.951625405Z"} {"id":"6UDn6LHuZp","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":794.742167,"request_info":{"id":"gtraceid_OdsIzgdP7IqEUAI98Lzc","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027060117Z","status":200,"stop":"2023-11-03T10:25:12.821802244Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.821868149Z"} {"id":"zVfv7ZhDK0","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":837.209621,"request_info":{"id":"gtraceid_OovTBM7tqQyYfLWBOuoN","method":"GET","path":"/v1/scopes?scope_id=o_HdUl3IdO2r","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:12.027403876Z","status":403,"stop":"2023-11-03T10:25:12.864613497Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:12.864644635Z"} {"id":"3I1Z01FrTl","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":917.06497,"request_info":{"id":"gtraceid_O8XAuCtUbexM1ZrtKIcv","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604024804Z","status":200,"stop":"2023-11-03T10:25:15.521089724Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.521150309Z"} {"id":"2R8E3gK70f","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":968.21363,"request_info":{"id":"gtraceid_IzyR4Go2cX603TWf9Q4X","method":"GET","path":"/v1/scopes/global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:14.604082573Z","status":200,"stop":"2023-11-03T10:25:15.572296163Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:15.572363149Z"} {"id":"q0EfrkM7rA","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":998.116061,"request_info":{"id":"gtraceid_9Qo8NLOuhNR3chIFLRjh","method":"GET","path":"/v1/scopes?scope_id=global","public_id":"at_qeTkERl3Kn","client_ip":"100.64.5.145"},"start":"2023-11-03T10:25:15.575405335Z","status":200,"stop":"2023-11-03T10:25:16.573521366Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:16.573599573Z"} {"id":"Eaj6I2Jil7","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.576419,"request_info":{"id":"gtraceid_JWkXTK7wGREHnh3PJu4A","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438582075Z","status":200,"stop":"2023-11-03T10:25:31.439158494Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439219119Z"} {"id":"7zSXs64jqr","source":"https://hashicorp.com/boundary/boundary-656c67dfb8-k2mw8/controller+worker","specversion":"1.0","type":"observation","data":{"latency-ms":0.97378,"request_info":{"id":"gtraceid_FipMaGN8ciNNg7zdyWMi","method":"GET","path":"/","client_ip":"100.64.5.87"},"start":"2023-11-03T10:25:31.438571835Z","status":200,"stop":"2023-11-03T10:25:31.439545605Z","version":"v0.1"},"datacontentype":"application/cloudevents","time":"2023-11-03T10:25:31.439597793Z"}

guyguy333 avatar Nov 03 '23 10:11 guyguy333

@guyguy333

  1. Could you please provide details about your setup?
  2. Could you also share your system specs? Boundary has a page about system requirements

elimt avatar Nov 03 '23 16:11 elimt

Hey @guyguy333 -- are you still encountering this issue?

AdamBouhmad avatar Jan 02 '24 23:01 AdamBouhmad

Hi @AdamBouhmad, yes I still have the issue. App is slow and request latency is about 900ms, resulting is really slow UI.

To answer @elimt and provide more details, I run container on a K8S cluster (hashicorp/boundary:0.14.3). There is no CPU or memory limit for this container. I setup these env vars: BOUNDARY_POSTGRES_URL, HOSTNAME (= boundary) and VAULT_TOKEN. Server is started using boundary server -config /boundary/config.hcl

My config:

disable_mlock = true
log_format    = "json"

controller {
  name        = "kubernetes-controller"
  description = "Boundary Controller"
  public_cluster_addr = "boundary-cluster.example.com:443"

  database {
    url = "env://BOUNDARY_POSTGRES_URL"
    max_open_connections = 10
    max_idle_connections = 10
  }
}

# Ingress TCP Route
worker {
  name              = "kubernetes-worker"
  description       = "Boundary Worker"
  address           = "localhost"
  initial_upstreams = ["boundary:9201"]
  public_addr       = "boundary-worker.example.com:443"
}

# Ingress
listener "tcp" {
  address              = "0.0.0.0"
  purpose              = "api"
  tls_disable          = true
  cors_enabled         = true
  cors_allowed_origins = ["https://boundary.example.com"]
}

listener "tcp" {
  address       = "0.0.0.0"
  purpose       = "cluster"
  tls_cert_file = "/certs/tls.crt"
  tls_key_file  = "/certs/tls.key"
}

listener "tcp" {
  address       = "0.0.0.0"
  purpose       = "proxy"
  tls_cert_file = "/certs/tls.crt"
  tls_key_file  = "/certs/tls.key"
}

kms "transit" {
  purpose            = "root"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-root"
  mount_path         = "transit/"
}

kms "transit" {
  purpose            = "recovery"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-recovery"
  mount_path         = "transit/"
}

kms "transit" {
  purpose            = "worker-auth"
  address            = "https://vault.example.com"

  // Key configuration
  key_name           = "boundary-worker-auth"
  mount_path         = "transit/"
}

Certs are mounted from a K8S secret in /certs.

Host machine uses AMD64 arch and has 4 cores and 32Gb RAM

Currently, there is no load, only me testing solution to have something stable before opening it.

Thanks for considering the issue :)

guyguy333 avatar Jan 03 '24 10:01 guyguy333