self-hosted icon indicating copy to clipboard operation
self-hosted copied to clipboard

increase postgres max_connections above 100 connections

Open erfantkerfan opened this issue 1 year ago • 18 comments

In my case, the default 100 connections in Postgres is hitting its limit and needs to be set to a higher value. Since other people might need to change this, appending it to the Postgres flags seems only logical.

Legal Boilerplate

Look, I get it. The entity doing business as "Sentry" was incorporated in the State of Delaware in 2015 as Functional Software, Inc. and is gonna need some rights from me in order to utilize my contributions in this here PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Sentry can use, modify, copy, and redistribute my contributions, under Sentry's choice of terms.

erfantkerfan avatar Jan 28 '24 16:01 erfantkerfan

I think the better approach would be to take the value from an env variable and default to 100

BYK avatar Jan 28 '24 18:01 BYK

I made the changes you suggested. Can you confirm? @BYK

erfantkerfan avatar Jan 30 '24 07:01 erfantkerfan

@erfantkerfan I'd add the default value (100 btw not 300 here: https://github.com/getsentry/self-hosted/blob/master/.env) and this should be good to go?

BYK avatar Jan 30 '24 11:01 BYK

@erfantkerfan I'd add the default value (100 btw not 300 here: https://github.com/getsentry/self-hosted/blob/master/.env) and this should be good to go?

Done. Changed to the default 100.

erfantkerfan avatar Jan 30 '24 13:01 erfantkerfan

I don't think this is necessary, how many active connections does your postgres currently have?

I'm getting this error:

I don't think this is necessary, how many active connections does your postgres currently have?

I'm getting the exact error that tells me I have used all my Postgres connections, also I'm using latest self-host sentry version and this change fixed my problem for when we had a very high load of sentry requests.

erfantkerfan avatar Jan 30 '24 14:01 erfantkerfan

@erfantkerfan you still did not add it to the .env file? Also I'd defer to @aminvakil for further discussion

BYK avatar Jan 30 '24 16:01 BYK

Please paste the error, also please paste the command used and your active connections.

I still think this is a XY problem, could you please state what do you mean by high load? How many requests per second?

aminvakil avatar Jan 30 '24 23:01 aminvakil

Please paste the error, also please paste the command used and your active connections.

I still think this is a XY problem, could you please state what do you mean by high load? How many requests per second?

Let me recap, I'm having about 50 req/s in normal, but in high traffic hours I'm getting more than 120 req/s and I sometimes I was getting 500 status codes from sentry, after reading sentry logs I found out that sentry is having problems connection to Postgres, after reading Postgres logs I saw this error message: sorry, too many clients already My solution was increasing the connection limit of Postgres, since this might happen to other people, I pushed my changed to your upstream. @aminvakil

you still did not add it to the .env file?

No, I think the default 100 will be sufficient for most people, but for others that might encounter this problem finding out this .env would be easy (since it's present in docker-compose.yml), But if you insist on adding it to the sample env file It's okay. @BYK

erfantkerfan avatar Jan 31 '24 06:01 erfantkerfan

Please paste the error, also please paste the command used and your active connections. I still think this is a XY problem, could you please state what do you mean by high load? How many requests per second?

Let me recap, I'm having about 50 req/s in normal, but in high traffic hours I'm getting more than 120 req/s and I sometimes I was getting 500 status codes from sentry, after reading sentry logs I found out that sentry is having problems connection to Postgres, after reading Postgres logs I saw this error message: sorry, too many clients already My solution was increasing the connection limit of Postgres, since this might happen to other people, I pushed my changed to your upstream. @aminvakil

Postgres is not good in handling lots of connections, that's why lots of postgresql connection poolers exist and they are getting used in production.

Also we're handling about 800 rps without having to tweak postgresql connections, that's why I don't think your "high load" suffices the reason to merge this PR.

you still did not add it to the .env file?

No, I think the default 100 will be sufficient for most people, but for others that might encounter this problem finding out this .env would be easy (since it's present in docker-compose.yml), But if you insist on adding it to the sample env file It's okay. @BYK

Ok, aside from merging this or not, this has to be added in .env, otherwise why are we having an environment variable at all in the first place?

aminvakil avatar Jan 31 '24 10:01 aminvakil

@erfantkerfan - I want it in the .env file mostly for discoverability reasons. Otherwise you are right, technically we do not need to define it since we have the fallback value in place.

@aminvakil

Postgres is not good in handling lots of connections, that's why lots of postgresql connection poolers exist and they are getting used in production.

Yes but if we say this, then I'd argue this repo should also bake these kinds of things in such as pgbouncer. Or at least provide easy ways to plug them in when needed. Right now we have none of that so I think this patch is a move in the right direction regarding that.

Also we're handling about 800 rps without having to tweak postgresql connections, that's why I don't think your "high load" suffices the reason to merge this PR.

This discrepancy is worth exploring but I'm not sure "we can handle this without changing the setting" is a great argument against adding the flex point to change the setting easily when people need it.

BYK avatar Jan 31 '24 12:01 BYK

@aminvakil

Postgres is not good in handling lots of connections, that's why lots of postgresql connection poolers exist and they are getting used in production.

Yes but if we say this, then I'd argue this repo should also bake these kinds of things in such as pgbouncer. Or at least provide easy ways to plug them in when needed. Right now we have none of that so I think this patch is a move in the right direction regarding that.

Correct.

Also we're handling about 800 rps without having to tweak postgresql connections, that's why I don't think your "high load" suffices the reason to merge this PR.

This discrepancy is worth exploring but I'm not sure "we can handle this without changing the setting" is a great argument against adding the flex point to change the setting easily when people need it.

My argument is that we should make sure that the problem is postgresql max connections prior to adding a configuration for it. For example I think that maybe OP has created too many workers, therefore opening too many connections to postgresql.

From my experience when you add these limits configuration, many users just raise them blindly which results in making their instance perform worse.

See hundreds of tutorial which just disable kernel limits of open files for really no reason, right now I see this PR going that way.

aminvakil avatar Jan 31 '24 15:01 aminvakil

My argument is that we should make sure that the problem is postgresql max connections prior to adding a configuration for it. For example I think that maybe OP has created too many workers, therefore opening too many connections to postgresql.

Ah, that's a fair point. @erfantkerfan what else did you change from the default configuration we provided?

BYK avatar Jan 31 '24 22:01 BYK

My argument is that we should make sure that the problem is postgresql max connections prior to adding a configuration for it. For example I think that maybe OP has created too many workers, therefore opening too many connections to postgresql.

Ah, that's a fair point. @erfantkerfan what else did you change from the default configuration we provided?

That's a fair point. But in the case of Postgres I have to disagree, most of the time it fails at handling large amount of connections. Basically I'm using the default config, here is my diff:

diff --git a/.env b/.env
index 5fcf6ca..9dc4cf9 100644
--- a/.env
+++ b/.env
@@ -1,5 +1,5 @@
 COMPOSE_PROJECT_NAME=sentry-self-hosted
-SENTRY_EVENT_RETENTION_DAYS=90
+SENTRY_EVENT_RETENTION_DAYS=45
 # You can either use a port number or an IP:PORT combo for SENTRY_BIND
 # See https://docs.docker.com/compose/compose-file/#ports for more
 SENTRY_BIND=9000

I want it in the .env file mostly for discoverability reasons. Otherwise you are right, technically we do not need to define it since we have the fallback value in place.

I added it to the .env with the default 100 limit too in the hope that this PR seems okay now. @BYK

erfantkerfan avatar Feb 01 '24 01:02 erfantkerfan

Please paste the error, also please paste the command used and your active connections.

cd YOUR_SELF_HOSTED_INSTANCE
docker compose exec postgres su postgres
psql
SELECT COUNT(datid) FROM pg_stat_activity WHERE state = 'active';

aminvakil avatar Feb 01 '24 18:02 aminvakil

SELECT COUNT(datid) FROM pg_stat_activity WHERE state = 'active';

In non-peak hours:

 count 
-------
    11
(1 row)

In peak hours:

 count 
-------
    44
(1 row)

In the past couple of days, I have not found connections above 100, (It's a case that only happens about once or twice a month) Also, I think this query would be more suitable since we are being limited by number of connections not the actual load on the Postgres

SELECT COUNT(datid) FROM pg_stat_activity;

@aminvakil

erfantkerfan avatar Feb 03 '24 04:02 erfantkerfan

SELECT COUNT(datid) FROM pg_stat_activity WHERE state = 'active';

In non-peak hours:

 count 
-------
    11
(1 row)

In peak hours:

 count 
-------
    44
(1 row)

Please be precise, are these the output of the command I've pasted or something else?

In the past couple of days, I have not found connections above 100, (It's a case that only happens about once or twice a month)

How do you know this?

Also, I think this query would be more suitable since we are being limited by number of connections not the actual load on the Postgres

Can you elaborate this sentence? Is this PR about number of connections or "actual load on the Postgres"? Also what does max_connections which we're talking about in this PR changes all connections (active + idle + ?) or only active connections?

SELECT COUNT(datid) FROM pg_stat_activity;

@aminvakil

And please do not mention me on every single comment, I'm watching this PR, thanks.

aminvakil avatar Feb 03 '24 15:02 aminvakil

Please be precise, are these the output of the command I've pasted or something else?

Yes sir.

In the past couple of days, I have not found connections above 100, (It's a case that only happens about once or twice a month)

How do you know this?

How do I know I have not reached above 100 connections? Because in this case I get logs in container output which I can grep and find them.

How do I know this happens once or twice a month? Because I get 500 error this often from sentry and after investigating I see this max_connections error both in Sentry container and Postgres logs.

Can you elaborate this sentence? Is this PR about number of connections or "actual load on the Postgres"? Also what does max_connections which we're talking about in this PR changes all connections (active + idle + ?) or only active connections?

From what I understand from the docs and some blogs, this value changes the combined number of connections, which is:

SELECT COUNT(datid) FROM pg_stat_activity;

erfantkerfan avatar Feb 03 '24 19:02 erfantkerfan

I still cannot see why this is the only instance which needs this fix, and explained my arguments on not merging this PR in https://github.com/getsentry/self-hosted/pull/2740#issuecomment-1919327593 .

I don't see any added value from OP in later comments to change my mind about this.

But I'm not a maintainer of this project, @hubertdeng123 please do as you see fit, I've made my arguments.

aminvakil avatar Feb 04 '24 12:02 aminvakil

Just reading up here, and I see valid points from both sides. FWIW, I don't think adding more connections is the best way to go about things when hitting the connection limit since it can result in spikes in resources, but I also don't see the harm in adding this as a configurable option and surfacing that in the .env file. If this works as a solution for OP here, it may be useful to others as well.

Before we merge this, I do think it may be helpful to add a comment along the lines warning users of additional CPU/RAM usage if the maximum connections in postgres is raised. Could you do that @erfantkerfan?

hubertdeng123 avatar Feb 28 '24 19:02 hubertdeng123

@hubertdeng123 Sure, it's done.

erfantkerfan avatar Feb 29 '24 20:02 erfantkerfan