self-hosted
self-hosted copied to clipboard
Improve cpu_count by os.sched_getaffinity
Problem Statement
I deployed self-hosted Sentry on a large-scale server. We split the server into many LXC containers. Then make docker nested in the LXC containers.
For example, I have a 112 cores physical server. Then I have a LXC container with 8 cores CPU limit in the server. Then I deployed self-hosted Sentry in the LXC container.
The problem is that Sentry started so many workers that runs out so many CPU and memory.
I found that:
- Sentry thinks the CPU count is 112 rather than 8, so it started 112 worker processes.
docker exec sentry-worker-1 python3 -c "from multiprocessing import cpu_count; print(cpu_count())"returns112.docker exec sentry-worker-1 python3 -c "import os; print(os.cpu_count())"returns112.docker exec sentry-worker-1 python3 -c "import os; print(len(os.sched_getaffinity(0)))"returns8.docker exec sentry-worker-1 nprocreturns8.
References:
- https://bugs.python.org/issue36054
Solution Brainstorm
I think we should replace all of cpu_count() with len(os.sched_getaffinity(0)).
Product Area
Performance
Assigning to @getsentry/support for routing, due by (sfo). ⏲️
Quick lookup here, looks like we may run into issues since os.sched_getaffinity(0) is not available on some unix platforms such as OSX
https://stackoverflow.com/a/55423170/6401252
Can we check if sched_getaffinity is available, otherwise fallback to cpu_count
As a workaround, could you add a parameter in your docker-compose.yml or docker-compose.override.yml file to include a parameter for concurrency for your worker? Seems like your use case might be pretty specific so wondering if this will work for you
https://github.com/getsentry/sentry/blob/master/src/sentry/runner/commands/run.py#L290
@hubertdeng123 Sure, it's my current solution. I've modified the docker-compose.yml and it works.
But I think it would be better if we can make it determine cpu_count correctly, or provide an environment variable to configure the worker_count at least.
Maybe we can do something like:
services:
worker:
command: run worker --concurrency ${SENTRY_WORKER_COUNT:-$(nproc)}
I found that ${SENTRY_WORKER_COUNT:-$(nproc)} can only work with shell (eg. entrypoint.sh).
It does not work with command of docker-compose.yml. So I said do something LIKE that.
This issue has gone three weeks without activity. In another week, I will close it.
But! If you comment or otherwise update it, I will reset the clock, and if you label it Status: Backlog or Status: In Progress, I will leave it alone ... forever!
"A weed is but an unloved flower." ― Ella Wheeler Wilcox 🥀
I revisited this question. The core issue is that Python does not consider the system information from cgroups, while nproc does.
> cat /sys/fs/cgroup/cpuset.cpus.effective
1,3-4,8,13-14,16,19
I think we should label it Status: Backlog at least. This is indeed a problem, although limited in scope.
@hubertdeng123 Sure, it's my current solution. I've modified the
docker-compose.ymland it works.But I think it would be better if we can make it determine cpu_count correctly, or provide an environment variable to configure the worker_count at least.
Maybe we can do something like:
services: worker: command: run worker --concurrency ${SENTRY_WORKER_COUNT:-$(nproc)}
@PMExtra could you please explain what exactly have you added to docker-comspose.yml to explicitly tell Sentry the --concurrency value? We're also on LXC and experiencing the same issue.