SafeLine icon indicating copy to clipboard operation
SafeLine copied to clipboard

[Bug] 4.4.2非常占用资源,cpu和内存随着时间程线性上升,4.3.2不会有问题

Open XRJ1230663 opened this issue 1 year ago • 56 comments

问题描述

4.4.2非常占用资源,cpu和内存随着时间程线性上升,4.3.2不会有问题

版本号

4.4.2

复现方法

4.4.2非常占用资源,cpu和内存随着时间程线性上升,4.3.2不会有问题

期望的结果

解决负载高的问题或者能支持指定版本安装

XRJ1230663 avatar Mar 11 '24 06:03 XRJ1230663

我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2

image

QYG2297248353 avatar Mar 11 '24 07:03 QYG2297248353

我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2

image

昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满 image

XRJ1230663 avatar Mar 11 '24 10:03 XRJ1230663

@XRJ1230663 能否具体看下是那个服务的占用

xbingW avatar Mar 12 '24 02:03 xbingW

@QYG2297248353 目前 fvm 还是在持续上升吗

xbingW avatar Mar 12 '24 02:03 xbingW

@QYG2297248353 目前 fvm 还是在持续上升吗

后续就没有了,保持在 8-9%之间,到现在一天了 还是有点高,比SpringBoot程序都高,建议优化,但是比以前好,不至于半夜告警访问缓慢

image

QYG2297248353 avatar Mar 12 '24 04:03 QYG2297248353

我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2 image

昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满 image

我服务器2核2G,跑个雷池,和前端就不错了,大头还是雷池,内存一高就访问缓慢,没办法只能迁移其他大一点服务到其他服务器

QYG2297248353 avatar Mar 12 '24 04:03 QYG2297248353

@XRJ1230663 能否具体看下是那个服务的占用

4.4.2已经不使用了,昨天看了一下,这个luigi内存占了超过50%

XRJ1230663 avatar Mar 12 '24 05:03 XRJ1230663

对,这个问题从4.3.3开始到现在最新4.4.2就一直存在,微信群里也反映过,但是最后没有下文。重启后luigi 大概3-5分钟就会开始持续占用CPU100%以上,并且一直占用,此时QPS显示开始异常,5-8小时后,detector容器就会报unhealty(系统进入类bypass模式)不记录日志,不检测流量,只有tengine做单纯的转发

fankejing-just avatar Mar 13 '24 01:03 fankejing-just

我看了一下,早上卡的不行,fvm 2G服务器 25% 内存占用,问了一圈说周末发了一版修复,早上升级后好了,内存占用大约在3%左右,但是,现在内存占用已经达到8%左右,时间大约5小时左右。 版本是由 4.4.1 >升级> 4.4.2 image

昨晚升级了4.4.2,但依然很耗资源,2核4G服务器单独部署的基本沾满 image

我服务器2核2G,跑个雷池,和前端就不错了,大头还是雷池,内存一高就访问缓慢,没办法只能迁移其他大一点服务到其他服务器

不要说2G了,我64G放着跑几天,都能全部沾满掉,而且服务还不正常

fankejing-just avatar Mar 13 '24 01:03 fankejing-just

目前看来是 luigi 服务的 CPU/内存 异常,我们会抓紧定位问题,并在后续版本修复

xbingW avatar Mar 13 '24 02:03 xbingW

情况相似,2c2g云服务器单独部署雷池waf,升级4.4.1版本后出现负载过高的情况,每天收到负载告警,重启后恢复,升级4.4.2版本后到目前为止正常 WX20240314-093343@2x

TScci avatar Mar 14 '24 01:03 TScci

5.0.0 版本已发布,麻烦各位更新到最新版后再观察一下

xbingW avatar Mar 14 '24 07:03 xbingW

5.0.0 版本已发布,麻烦各位更新到最新版后再观察一下

14日11点更新,到现在12小时很稳定,没有内存占用爬升的情况 image 并且占用优化很到位,2核2G的设备已经没有压力了

QYG2297248353 avatar Mar 14 '24 17:03 QYG2297248353

问题已解决,issue关闭

yrluke avatar Mar 22 '24 09:03 yrluke

升级到5.3.3 这个问题又来了。。。。。。

yrluke @.***> 于2024年3月22日周五 17:41写道:

问题已解决,issue关闭

— Reply to this email directly, view it on GitHub https://github.com/chaitin/SafeLine/issues/739#issuecomment-2014712091, or unsubscribe https://github.com/notifications/unsubscribe-auth/BEA23MZNEUJQQL5CH74ZFFTYZP4C5AVCNFSM6AAAAABEPW7MKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMJUG4YTEMBZGE . You are receiving this because you commented.Message ID: @.***>

-- 来自SKY,拥有无限自由!

fankejing-just avatar Apr 19 '24 00:04 fankejing-just

升级5.3.3后,在QPS达到100左右,QPS不再更新,持续显示为0,防护日志也不进行如何纪录,退回5.3.2上诉问题,消失

fankejing-just avatar Apr 19 '24 01:04 fankejing-just

不早点说,唉,我还是等了好几天才升级。2024/04/18早上12点左右升级的,fvm今天就达到8.1的占用了

image

@yrluke 快来看看,虽然这次没有导致服务器奔溃,占用比例还是有点高,snserver也是比平常略高些

QYG2297248353 avatar Apr 19 '24 02:04 QYG2297248353

@QYG2297248353 目前 fvm 还是在持续上升吗

后续就没有了,保持在 8-9%之间,到现在一天了 还是有点高,比SpringBoot程序都高,建议优化,但是比以前好,不至于半夜告警访问缓慢

image

@yrluke 已经达到或者说略微超过这个问题的水平了

QYG2297248353 avatar Apr 19 '24 02:04 QYG2297248353

@QYG2297248353 luigi cpu 内存什么表现

xbingW avatar Apr 19 '24 02:04 xbingW

luigi cpu 0 image

QYG2297248353 avatar Apr 19 '24 03:04 QYG2297248353

小小半天功夫 10%了,着实是在稳步爬升 image @yrluke

QYG2297248353 avatar Apr 19 '24 07:04 QYG2297248353

看下 fvm 的日志?

xbingW avatar Apr 19 '24 07:04 xbingW

docker logs safeline-fvm 
2024/04/18 18:51:00 [Fx] PROVIDE	*runner.Runner <= git.in.chaitin.net/dev/go/module.v2/runner.NewRunner()
2024/04/18 18:51:00 [Fx] SUPPLY	*config.ManagerConfig
2024/04/18 18:51:00 [Fx] SUPPLY	*gorm.DB
2024/04/18 18:51:00 [Fx] PROVIDE	*fvm.FVM <= git.in.chaitin.net/patronus/fvm/manager/module/fvm.New()
2024/04/18 18:51:00 [Fx] PROVIDE	[]*node.Client <= git.in.chaitin.net/patronus/fvm/manager/module/node.NewClient()
2024/04/18 18:51:00 [Fx] SUPPLY	*grpc.Server
2024/04/18 18:51:00 [Fx] PROVIDE	*manager.FVMServer <= git.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.NewServer()
2024/04/18 18:51:00 [Fx] PROVIDE	*node.PullServer <= git.in.chaitin.net/patronus/fvm/manager/module/rpc/node.NewServer()
2024/04/18 18:51:00 [Fx] SUPPLY	*log.Logger
2024/04/18 18:51:00 [Fx] PROVIDE	fx.Lifecycle <= go.uber.org/fx.New.func1()
2024/04/18 18:51:00 [Fx] PROVIDE	fx.Shutdowner <= go.uber.org/fx.(*App).shutdowner-fm()
2024/04/18 18:51:00 [Fx] PROVIDE	fx.DotGraph <= go.uber.org/fx.(*App).dotGraph-fm()
2024/04/18 18:51:00 [Fx] INVOKE		git.in.chaitin.net/dev/go/module.v2/runner.glob..func1()
2024/04/18 18:51:00 [Fx] INVOKE		git.in.chaitin.net/patronus/fvm/manager/module/manager.Run()

2024/04/18 18:51:00 /work/module/manager/manager.go:19 SLOW SQL >= 200ms
[714.552ms] [rows:0] CREATE TABLE `fvm_version` (`latest` integer,`oldest` integer)

2024/04/18 18:51:01 /work/module/manager/manager.go:23 SLOW SQL >= 200ms
[246.533ms] [rows:0] CREATE TABLE `fvm_update` (`version` integer,`content` blob)

2024/04/18 18:51:02 /work/module/manager/manager.go:24 SLOW SQL >= 200ms
[1069.178ms] [rows:0] CREATE TABLE `fvm_re` (`id` integer,`table` text,`content` blob,PRIMARY KEY (`id`))

2024/04/18 18:51:02 /work/module/db/db.go:78 record not found
[0.214ms] [rows:0] SELECT * FROM `fvm_update` WHERE version = 0 ORDER BY `fvm_update`.`version` LIMIT 1
2024/04/18 18:51:02 [Fx] INVOKE		git.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.Register()
2024/04/18 18:51:02 [Fx] INVOKE		git.in.chaitin.net/patronus/fvm/manager/module/rpc/node.Register()
2024/04/18 18:51:02 [Fx] INVOKE		git.in.chaitin.net/patronus/fvm/manager/module/rpc.Run()
2024/04/18 18:51:02 [Fx] START		git.in.chaitin.net/dev/go/module.v2/runner.NewRunner()
2024/04/18 18:51:02 [Module] START	git.in.chaitin.net/patronus/fvm/manager/module/rpc.Run()
2024/04/18 18:51:02 [Fx] RUNNING
2024/04/18 18:51:07 INFO refresh fsl because detector policy version is 0
2024/04/18 18:51:07 ERROR build fsl error err="database is locked\nload version from db\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).pushFsl.func1\n\t/work/module/rpc/fvm/fvm.go:118\ngorm.io/gorm.(*DB).Transaction\n\t/go/pkg/mod/gorm.io/[email protected]/finisher_api.go:647\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).pushFsl\n\t/work/module/rpc/fvm/fvm.go:111\ngit.in.chaitin.net/patronus/fvm/manager/module/rpc/fvm.(*FVMServer).check\n\t/work/module/rpc/fvm/fvm.go:63\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1650"

2024/04/18 18:51:07 /work/module/db/db.go:59 database is locked
[0.038ms] [rows:0] INSERT INTO `fvm_version` (`latest`,`oldest`) VALUES (0,0)
2024/04/18 18:51:08 INFO Push FSL success

2024/04/18 18:51:08 /work/module/db/db.go:129 database is locked
[0.052ms] [rows:0] DELETE FROM `fvm_update` WHERE version != 0
2024/04/18 18:51:08 [ERROR] fvm/fvm_grpc.pb.go:316 error:%v send to stream: failed to remove all diff: database is locked
2024/04/18 18:51:10 INFO Push FSL success
2024/04/19 01:00:03 INFO Push FSL success
2024/04/19 01:04:41 INFO Push FSL success
2024/04/19 12:18:59 INFO Push FSL success
2024/04/19 12:19:59 INFO Push FSL success
2024/04/19 12:27:32 INFO Push FSL success
2024/04/19 12:36:52 INFO Push FSL success
2024/04/19 12:37:15 INFO Push FSL success
2024/04/19 12:39:09 INFO Push FSL success
2024/04/19 12:45:00 INFO Push FSL success
2024/04/19 12:46:14 INFO Push FSL success
2024/04/19 15:29:07 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:50439->127.0.0.11:53: i/o timeout"
2024/04/19 15:29:46 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:44893->127.0.0.11:53: i/o timeout"
2024/04/19 15:30:15 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:48864->127.0.0.11:53: i/o timeout"
2024/04/19 15:30:42 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:45189->127.0.0.11:53: i/o timeout"
2024/04/19 15:31:12 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:57307->127.0.0.11:53: i/o timeout"
2024/04/19 15:31:53 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:45976->127.0.0.11:53: i/o timeout"
2024/04/19 15:32:18 ERROR get stat error err="Get response failed:\n    git.in.chaitin.net/patronus/fvm/manager/module/fvm.(*FVM).GetStat\n        /work/module/fvm/fvm.go:351\n  - Get \"http://safeline-detector:8001/stat\": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:49508->127.0.0.11:53: i/o timeout"

QYG2297248353 avatar Apr 19 '24 07:04 QYG2297248353

看起来容器内网络有点问题,你看看为啥会 Get "http://safeline-detector:8001/stat": dial tcp: lookup safeline-detector on 127.0.0.11:53: read udp 127.0.0.1:50439->127.0.0.11:53: i/o timeout

xbingW avatar Apr 19 '24 08:04 xbingW

那这就是在为难我了,无从下手呀,safeline-detector日志都是:

[2024-04-19 15:32:17.329] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:32:17.329] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:32:17.329] [1] [ERROR] send weblog error: error trying to connect: dns error: failed to lookup address information: Temporary failure in name resolution
[2024-04-19 15:33:50.581] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.580] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.670] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:50.670] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:51.990] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:51.990] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.140] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.210] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.211] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:52.591] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:33:58.851] [1] [ERROR] send weblog error: connection error: Connection reset by peer (os error 104)
[2024-04-19 15:34:18.364] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:34:24.893] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:09.970] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:11.540] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.551] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.720] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.720] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.831] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.910] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:13.911] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.121] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }
[2024-04-19 15:35:14.122] [1] [ERROR] failed to send detect response, Os { code: 32, kind: BrokenPipe, message: "Broken pipe" }

QYG2297248353 avatar Apr 19 '24 09:04 QYG2297248353

fvm 还在上升吗,现在是多少了

xbingW avatar Apr 19 '24 09:04 xbingW

image

QYG2297248353 avatar Apr 19 '24 09:04 QYG2297248353

整了个 2c4g的环境 1000 qps 持续 30m 测了一下,没发现持续上涨的情况。下周版本给 fvm 带上一个 pprof,到时帮忙采集一下信息再看下可以吗

xbingW avatar Apr 19 '24 09:04 xbingW

主要是上一个版本可没这问题,之前还修复过,唉,

QYG2297248353 avatar Apr 19 '24 09:04 QYG2297248353

我们会持续再观察一下这个情况

xbingW avatar Apr 19 '24 10:04 xbingW