etcd-cloud-operator How to troubleshoot memory issues in etcd

trafficstars

Hi, I would like to know how we can troubleshoot memory issue in etcd and how and how to mitigate such memory issues?

Jun 26 '24 09:06 iamnst19

Hey!

Like you said - you'd be looking at etcd itself - as the operator's own memory usage is going to be very minimal, best to refer to their repository / docs / code. Etcd is started as an embedded server though as part of the etcd-cloud-operator, so it may first seem as if the operator is taking up memory.

Jun 26 '24 09:06 Quentin-M

I think the memory spike is due to S3 backup. How do I disable S3 backup? Also how and where do I need to add profiling --> https://github.com/google/pprof to check the memory profile?

Jul 10 '24 18:07 iamnst19

Th snapshot providers streams the data from etcd towards the snapshot destination, so I'd think it'd be ok if everything is implemented alright - unless etcd itself has a memory spike as part of the save somehow. Do you have a memory chart?

Disabling S3 snapshots is not recommended as this will cripple your ability to do disaster recovery, unless you enable the file backup provider with a separate and reliable storage to use. By default, the operator requires a snapshot provider.

To enable pprof, you'd want to inject it in the main here behind a command-line flag:

import (
  pprof "net/http/pprof"
)

if flagPprof != nil && len(flagPprof) > 0 {
  go func() {
    zap.S().Infof("enabling pprof on %s", flagPprof)
    pprof.ListenAndServe(flagPprof, nil)
  }
}

Jul 10 '24 22:07 Quentin-M

Screenshot 2024-07-11 at 11 18 51 AM

The baseline has shifted and memory is heaping and I can see that these spike happening during the backup to S3 can I like make an adjustment to this

snapshot:
    provider: s3 # This should be configured to S3 in any real environments.
    interval: 30m
    ttl: 24h

So the backup is not very aggressive? Maybe increase the interval or reduce the TTL. If then what need to be the desired values here?

Jul 11 '24 05:07 iamnst19

Ideally this backup activity should be happening in non peak hours. How to set the time to do the backup once in a week during off peak hours?

Jul 12 '24 10:07 iamnst19

Can you please help here?

Jul 23 '24 19:07 iamnst19

etcd-cloud-operator etcd-cloud-operator copied to clipboard

How to troubleshoot memory issues in etcd

etcd-cloud-operator
etcd-cloud-operator copied to clipboard