Bug: online user and memory used are only increasing ,and there are a large number of TCP connections in the FIN-WAIT-2 state
What happened?
On the granafa monitoring page of openim, we can see that online_user num and user_login_total are increasing continuously, and there are still high numbers even in the early morning.
There are many TCP connections to openim server (10001 port number ) in the FIN_WAIT2 state in the system.
What did you expect to happen?
We'd like to understand why this happens and how to fix it, preferably can show the real number of online users. Memory usage can be efficiently released.
How can we reproduce it (as minimally and precisely as possible)?
https://github.com/openimsdk/openim-docker/blob/v3.5.0/docker-compose.yaml
the Dockerfile of openim-oimws:
# Build Stage
FROM golang:1.20 AS builder
# Set go mod installation source and proxy
ARG GO111MODULE=on
ARG GOPROXY=https://goproxy.cn,direct
ENV GO111MODULE=$GO111MODULE
ENV GOPROXY=$GOPROXY
# Set up the working directory
WORKDIR /openim/openim-server
COPY go.mod go.sum ./
RUN go mod download
# Copy all files to the container
ADD . .
RUN make clean
RUN make build
FROM ghcr.io/openim-sigs/openim-ubuntu-image:latest
ENV OPENIM_API_IP ${OPENIM_API_IP}
ENV OPENIM_API_PORT ${OPENIM_API_PORT}
ENV OPENIM_WS_IP ${OPENIM_WS_IP}
ENV OPENIM_WS_PORT ${OPENIM_WS_PORT}
ENV SDK_WS_PORT ${SDK_WS_PORT}
ENV OPENIM_LOG_LEVEL ${OPENIM_LOG_LEVEL}
ENV OPENIM_DB_DIR ${OPENIM_DB_DIR}
WORKDIR /app
COPY --from=builder /openim/openim-server/_output/bin/ /app/bin/
CMD /app/bin/main -openIM_api_address="http://${OPENIM_API_IP}:${OPENIM_API_PORT}" -openIM_ws_address="ws://${OPENIM_WS_IP}:${OPENIM_WS_PORT}" -sdk_ws_port=${SDK_WS_PORT} -openIM_log_level=${OPENIM_LOG_LEVEL} -openIMDbDir="${OPENIM_DB_DIR}"
Anything else we need to know?
No response
version
openim server 3.5 docker pull openim/openim-server:release-v3.5 openim-chat release-v1.5 docker pull openim/openim-chat:release-v1.5 openim-oimws v3.5.1-alpha.8 https://github.com/openim-sigs/oimws/tree/v3.5.1-alpha.8
Cloud provider
OS version
Linux OpenIMServer 4.14.35-1902.300.11.el7uek.x86_64
Install tools
Hello! Thank you for filing an issue.
If this is a bug report, please include relevant logs to help us debug the problem.
Join slack 🤖 to connect and communicate with our developers.
Could you please confirm if this Dockerfile was authored by you? Additionally, would you be able to provide specific version information used? Thank you.
Could you please confirm if this Dockerfile was authored by you? Additionally, would you be able to provide specific version information used? Thank you.
Yes. The Dockerfile of openim-oimws was was authored by myself.
mariadb:10.6 kafka:3.5.1 mongo:6.0.2 redis:7.0.0
openim server image version: release-v3.5 openim chat image version: release-v1.5 openim-admin:toc-base-open-docker.35 oimws git branch :v3.5.1-alpha.8
Additionally, I also added the following information to the linux system: net.core.somaxconn=65535 net.ipv4.tcp_tw_reuse=1 net.ipv4.tcp_syncookies = 1 net.ipv4.tcp_syn_retries = 2 net.ipv4.tcp_fin_timeout = 10 net.ipv4.tcp_max_syn_backlog = 8192 net.ipv4.tcp_fastopen = 3 net.ipv4.tcp_slow_start_after_idle = 0 net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 6
When I use netstat cmd in linux console:"netstat -nat | grep FIN_WAIT2|wc -l", FIN_WAIT2 TCP connection number is about 28245 now.
There are some warning logs in openim-oimws logs files ,like as these: openim-oimws | 2024-03-26 19:35:02.156 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35564->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.160 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449856320880343", "error": "read tcp 10.195.1.12:35568->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.160 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35566->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"error": "read tcp 10.195.1.12:35570->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449858558726627", "error": "read tcp 10.195.1.12:35574->10.195.1.1:10001: read: connection reset by peer"} openim-oimws | 2024-03-26 19:35:02.170 WARN [PID:7] [interaction/long_conn_mgr.go:186] reConn {"operationID": "1711449856066444155", "error": "read tcp 10.195.1.12:35576->10.195.1.1:10001: read: connection reset by peer"}
I recommend you update to release-v3.8. If you encounter any new issues, please reopen this issue or create a new one.