open-im-server icon indicating copy to clipboard operation
open-im-server copied to clipboard

Bug: online user and memory used are only increasing ,and there are a large number of TCP connections in the FIN-WAIT-2 state

Open manlinux opened this issue 1 year ago • 4 comments

What happened?

On the granafa monitoring page of openim, we can see that online_user num and user_login_total are increasing continuously, and there are still high numbers even in the early morning.

There are many TCP connections to openim server (10001 port number ) in the FIN_WAIT2 state in the system.

4ffd99bd35d0dfd1b0677985abe21bb

Clip_2024-03-26_13-23-27

3eb7a54d0a42370a7ff1996638a3dc2

What did you expect to happen?

We'd like to understand why this happens and how to fix it, preferably can show the real number of online users. Memory usage can be efficiently released.

How can we reproduce it (as minimally and precisely as possible)?

https://github.com/openimsdk/openim-docker/blob/v3.5.0/docker-compose.yaml

the Dockerfile of openim-oimws:


# Build Stage
FROM golang:1.20 AS builder

# Set go mod installation source and proxy
ARG GO111MODULE=on
ARG GOPROXY=https://goproxy.cn,direct
ENV GO111MODULE=$GO111MODULE
ENV GOPROXY=$GOPROXY

# Set up the working directory
WORKDIR /openim/openim-server

COPY go.mod go.sum ./
RUN go mod download

# Copy all files to the container
ADD . .

RUN make clean
RUN make build

FROM ghcr.io/openim-sigs/openim-ubuntu-image:latest

ENV OPENIM_API_IP ${OPENIM_API_IP}
ENV OPENIM_API_PORT  ${OPENIM_API_PORT}
ENV OPENIM_WS_IP ${OPENIM_WS_IP}
ENV OPENIM_WS_PORT ${OPENIM_WS_PORT}
ENV SDK_WS_PORT  ${SDK_WS_PORT}
ENV OPENIM_LOG_LEVEL  ${OPENIM_LOG_LEVEL}
ENV OPENIM_DB_DIR ${OPENIM_DB_DIR}

WORKDIR /app
COPY --from=builder /openim/openim-server/_output/bin/ /app/bin/

CMD  /app/bin/main -openIM_api_address="http://${OPENIM_API_IP}:${OPENIM_API_PORT}" -openIM_ws_address="ws://${OPENIM_WS_IP}:${OPENIM_WS_PORT}" -sdk_ws_port=${SDK_WS_PORT} -openIM_log_level=${OPENIM_LOG_LEVEL} -openIMDbDir="${OPENIM_DB_DIR}"

Anything else we need to know?

No response

version

openim server 3.5 docker pull openim/openim-server:release-v3.5 openim-chat release-v1.5 docker pull openim/openim-chat:release-v1.5 openim-oimws v3.5.1-alpha.8 https://github.com/openim-sigs/oimws/tree/v3.5.1-alpha.8

Cloud provider

OS version

NAME="Oracle Linux Server" VERSION="7.8"

Linux OpenIMServer 4.14.35-1902.300.11.el7uek.x86_64

Install tools

manlinux avatar Mar 26 '24 05:03 manlinux

Hello! Thank you for filing an issue.

If this is a bug report, please include relevant logs to help us debug the problem.

Join slack 🤖 to connect and communicate with our developers.

kubbot avatar Mar 26 '24 05:03 kubbot

Could you please confirm if this Dockerfile was authored by you? Additionally, would you be able to provide specific version information used? Thank you.

cubxxw avatar Mar 26 '24 06:03 cubxxw

Could you please confirm if this Dockerfile was authored by you? Additionally, would you be able to provide specific version information used? Thank you.

Yes. The Dockerfile of openim-oimws was was authored by myself.

mariadb:10.6 kafka:3.5.1 mongo:6.0.2 redis:7.0.0

openim server image version: release-v3.5 openim chat image version: release-v1.5 openim-admin:toc-base-open-docker.35 oimws git branch :v3.5.1-alpha.8

Additionally, I also added the following information to the linux system: net.core.somaxconn=65535 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_syn_retries = 2 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_fastopen = 3 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 6

When I use netstat cmd in linux console:"netstat -nat | grep FIN_WAIT2|wc -l", FIN_WAIT2 TCP connection number is about 28245 now.

There are some warning logs in openim-oimws logs files ,like as these: openim-oimws | 2024-03-26 19:35:02.156 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35564->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.160 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449856320880343", "error": "read tcp 10.195.1.12:35568->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.160 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35566->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35570->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449858558726627", "error": "read tcp 10.195.1.12:35574->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449856066444155", "error": "read tcp 10.195.1.12:35576->10.195.1.1:10001: read: connection reset by peer"}

manlinux avatar Mar 27 '24 01:03 manlinux

I recommend you update to release-v3.8. If you encounter any new issues, please reopen this issue or create a new one.

skiffer-git avatar Sep 29 '24 06:09 skiffer-git