timescaledb-tune icon indicating copy to clipboard operation
timescaledb-tune copied to clipboard

work_mem value can be bad

Open roger-rainey opened this issue 5 years ago • 11 comments

The code that calculates work_mem can produce a value that is below the minimum threshold of 64kB. This will prevent timescale/postgress from starting. I have experienced this issue deploying to a k8s node w/ 32 Cores and 120GB Ram. The output from timescale-tune was 41kB which causes postgres to fail to start.

roger-rainey avatar Mar 05 '19 00:03 roger-rainey

Yikes, that's no good. Did you set any other flags I should know about?

Will definitely look to push a fix ASAP

RobAtticus avatar Mar 05 '19 02:03 RobAtticus

I ask because when I try

timescaledb-tune --memory=120GB --cpus=32

I get this for memory settings

Recommendations based on 120.00 GB of available memory and 32 CPUs for PostgreSQL 11
shared_buffers = 30GB
effective_cache_size = 90GB
maintenance_work_mem = 2047MB
work_mem = 19660kB

So I wonder how it ended with 64KB

EDIT: Fixed memory flag

RobAtticus avatar Mar 05 '19 02:03 RobAtticus

No flags were added and it gave me a value of 41k so it would not run

On Mon, Mar 4, 2019, 6:50 PM RobAtticus [email protected] wrote:

I ask because when I try

timescaledb-tune --memory=32GB --cpus=32

I get this for memory settings

Recommendations based on 120.00 GB of available memory and 32 CPUs for PostgreSQL 11 shared_buffers = 30GB effective_cache_size = 90GB maintenance_work_mem = 2047MB work_mem = 19660kB

So I wonder how it ended with 64KB

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/timescale/timescaledb-tune/issues/38#issuecomment-469514757, or mute the thread https://github.com/notifications/unsubscribe-auth/AhMe_CRj85Gm5jUP3J2Xk2csaVW_Doerks5vTdttgaJpZM4bdfAH .

roger-rainey avatar Mar 05 '19 02:03 roger-rainey

Happy to correct the issue with it returning invalid values, but I am a bit worried that is is misreading your settings since it should not be giving 41KB for the given parameters (120GB RAM, 32 CPUs).

It would be useful if you could run the following commands from inside the container:

cat /sys/fs/cgroup/memory/memory.limit_in_bytes

and

free -m | grep 'Mem' | awk '{print $2}'

Thanks for the bug report

RobAtticus avatar Mar 05 '19 03:03 RobAtticus

Here are the results of the commands you sent. Although this machine has a lot of resources, GKE probably slices it differently which produces those results.

bash-4.4$ cat /sys/fs/cgroup/memory/memory.limit_in_bytes

268435456

bash-4.4$ free -m | grep 'Mem' | awk '{print $2}'

120873

On Tue, Mar 5, 2019 at 8:52 AM RobAtticus [email protected] wrote:

Closed #38 https://github.com/timescale/timescaledb-tune/issues/38 via #39 https://github.com/timescale/timescaledb-tune/pull/39.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/timescale/timescaledb-tune/issues/38#event-2181576414, or mute the thread https://github.com/notifications/unsubscribe-auth/AhMe_EzwcakF03OfZDQ9-dBKGOOgcNYWks5vTqDPgaJpZM4bdfAH .

roger-rainey avatar Mar 05 '19 18:03 roger-rainey

This got closed by the merge but there seems to be another problem at play here.

Specifically, 268435456 is only ~268MB, so your settings are based off that rather than the 120GB actually available on the machine. Do you have any insight as to why the machine would be giving that as the limit?

RobAtticus avatar Mar 08 '19 18:03 RobAtticus

bump @roger-rainey

RobAtticus avatar Mar 11 '19 21:03 RobAtticus

The memory number was coming from kubernetes request memory which is not the memory limit.

On Mon, Mar 11, 2019 at 2:43 PM RobAtticus [email protected] wrote:

bump @roger-rainey https://github.com/roger-rainey

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/timescale/timescaledb-tune/issues/38#issuecomment-471742699, or mute the thread https://github.com/notifications/unsubscribe-auth/AhMe_NZk3mdQPpyLbCUSJnUh-n0QnRXoks5vVs3ygaJpZM4bdfAH .

roger-rainey avatar Mar 12 '19 08:03 roger-rainey

@roger-rainey That's intriguing. The cgroups memory.limit_in_bytes should refer to the maximum memory allocated to the container, not the minimum requested. I could see it being possible that the two would match up if the node the container is scheduled on only has request_memory available, but with such a specific number that seems kind of weird.

Would you mind posting your k8s configuration for this pod, and any other information you have about resource utilization on the node your pod is on?

If it seems like there's still more than request_memory available, I might have to do a deeper dive on how the cgroups memory limits get set when request settings are specified in k8s.

LeeHampton avatar Mar 12 '19 16:03 LeeHampton

One other useful bit of info might be the output of

cat /sys/fs/cgroup/memory/memory.stat

RobAtticus avatar Mar 12 '19 16:03 RobAtticus

@roger-rainey Did you have any follow up on this re: @LeeHampton 's comment?

RobAtticus avatar Apr 22 '19 20:04 RobAtticus