open-im-server icon indicating copy to clipboard operation
open-im-server copied to clipboard

Bug: Increasing Pod Memory Usage for Push Service

Open Sylariam opened this issue 1 year ago • 5 comments

What happened?

Server version 3.5.0. Push service consumed too much memory. Possible memory leak? image

What did you expect to happen?

stable memory usage

How can we reproduce it (as minimally and precisely as possible)?

This is a prometheus metrics. Queried by sum (container_memory_working_set_bytes{image!="",pod_name=~"$Pod",namespace="$namespace"}) by (pod_name)

Anything else we need to know?

No response

version

```console $ {name} version # paste output here ```

Cloud provider

OS version

```console # On Linux: $ cat /etc/os-release # paste output here $ uname -a # paste output here # On Windows: C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture # paste output here ```

Install tools

Sylariam avatar Feb 22 '24 06:02 Sylariam

This seems a bit unusual. @FGadvancer

cubxxw avatar Feb 22 '24 12:02 cubxxw

Updates: After integrated push service with Pyroscope, and ran for a week, I got these stats: img_v3_028l_47679782-4626-4fe5-9aa3-bfd557f6511g Looks like push service get a lot of grpc conn in the process, then I checked the code: func (p *Pusher) k8sOnlinePush(ctx context.Context, msg *sdkws.MsgData, pushToUserIDs []string) (wsResults []*msggateway.SingleMsgToUserResults, err error) { for host, userIds := range usersHost { tconn, _ := p.discov.GetConn(ctx, host) usersConns[tconn] = userIds } every push will trigger this p.discov.GetConn, thus caused too much memory takeup

Sylariam avatar Mar 06 '24 03:03 Sylariam

I made a temp workaround: `

    var usersConns = make(map[*grpc.ClientConn][]string)
for host, userIds := range usersHost {
	//tconn, _ := p.discov.GetConn(ctx, host)
	//usersConns[tconn] = userIds

	if conn, ok := onlinePusherConnMap[host]; ok {
		log.ZDebug(ctx, "DEBUG reuse local conn", "host", host)
		usersConns[conn] = userIds
	} else {
		log.ZDebug(ctx, "DEBUG no valid local conn", "host", host)
		tconn, _ := p.discov.GetConn(ctx, host)
		usersConns[tconn] = userIds
		onlinePusherConnMu.Lock()
		//defer onlinePusherConnMu.Unlock()
		log.ZDebug(ctx, "DEBUG add to local conn", "host", host)
		onlinePusherConnMap[host] = tconn
		onlinePusherConnMu.Unlock()
	}
}

This will try to reuse conn if absent, otherwise GetConn. Not sure if this is safe solution. WDYT?

Sylariam avatar Mar 06 '24 03:03 Sylariam

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 7 days.

kubbot avatar May 13 '24 08:05 kubbot

this issue has fixed in release-v3.8, I recommend you update to new version. If you run into any new issues, please reopen this issue or create a new one.

skiffer-git avatar Sep 27 '24 11:09 skiffer-git