go-control-plane icon indicating copy to clipboard operation
go-control-plane copied to clipboard

Memory retained after connections closed

Open bwangelme opened this issue 2 years ago • 5 comments

What version of go-control-plane are you using?

  • github.com/envoyproxy/go-control-plane v0.10.3-0.20221003170831-bf9fc1db9d0f
  • google.golang.org/protobuf v1.28.0
  • google.golang.org/grpc v1.45.0

What version of Go are you using (go version)

go version go1.18.5 linux/amd64

What did you do?

I write a control plane by using go-control-plane, which sends config resources to envoy.

	snapshot, err := cache.NewSnapshot(
		p.newSnapshotVersion(),
		map[resource.Type][]types.Resource{
			resource.ListenerType: x.ListenerContents(),
			resource.ClusterType:  x.ClusterContents(),
			resource.EndpointType: x.EndpointsContents(),
			resource.RouteType:    x.RoutesContents(),
		},
	)

I open 190 envoy proxy clients to connect to my control plane. The memory usage of my control plane is 10G. (Get memory usage by k8s metric container_memory_usage_bytes)

image

Then I close all envoy proxy clients, but the memory usage is still too high. (4G)

What did you expect to see?

  1. control plane eats less memory when serving 190 clients
  2. control plane when memory when all envoy clients closed

What did you see instead?

  1. control plane eats 10G memory when serving 190 envoy clients, 10G is too much.
  2. control plane didn't release memory when I closed all the clients.

pprof

This is the pprof file on when I close all envoy clients. (4G memory usage)

14_52_inuse_space

pprof.gala.alloc_objects.alloc_space.inuse_objects.inuse_space.004.pb.gz

bwangelme avatar Oct 09 '22 07:10 bwangelme

Can you please let us know if you were using SOTW or Delta xDS?

alecholmez avatar Oct 28 '22 14:10 alecholmez

@alecholmez

I'm using the SOTW to return the config.

bwangelme avatar Oct 30 '22 02:10 bwangelme

This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.

github-actions[bot] avatar Nov 29 '22 04:11 github-actions[bot]

This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.

github-actions[bot] avatar Dec 06 '22 08:12 github-actions[bot]

@bwangelme I took a second look at this, looks like the pprof graph you dropped noted it's only using 105.75MB? Could somewhere else in your management server be causing the high memory usage? We use this internally at greymatter and we don't have any usage like that. It's generally sub 1G

alecholmez avatar Jan 31 '23 21:01 alecholmez

As mentioned by @alecholmez those numbers do not match numbers we see for other users. The snapshots will not be cleaned from the cache if not explicitly removed, but this seems like those memory usages are unrelated. It's also unclear where the memory is used (within this library, or in the user code managing configurations stored in this control-plane)

valerian-roche avatar Mar 27 '24 22:03 valerian-roche

Hi, we finally identified the root cause of the issue.

	source.ConfigSourceSpecifier = &core.ConfigSource_ApiConfigSource{
		ApiConfigSource: &core.ApiConfigSource{
			TransportApiVersion:       resource.DefaultAPIVersion,
			ApiType:                   core.ApiConfigSource_GRPC,
			SetNodeOnFirstMessageOnly: true,
			GrpcServices: []*core.GrpcService{{
				TargetSpecifier: &core.GrpcService_EnvoyGrpc_{
					EnvoyGrpc: &core.GrpcService_EnvoyGrpc{ClusterName: "xds_cluster"},
				},
			}},
		},
	}

I setted the GrpcServices on EdsClusterConfig, it is causing trouble for every envoy cluster create a connection to xds service.

We have approximately 200 envoy pod, with 400 cluster on each envoy pod, these envoys are connecting to a single xds service, resulting in high memory consumption on the xds service pod.

Thanks for your reply.

bwangelme avatar Mar 28 '24 03:03 bwangelme

Thanks for your reply. Yes when not using ads envoy uses a separate stream per cluster. Also because of the node only being sent on the first node the go-control-plane library ends up keeping a copy of the node metadata per stream, which may be very impactful if a lot of data is included in there

valerian-roche avatar Mar 28 '24 03:03 valerian-roche