cronsun icon indicating copy to clipboard operation
cronsun copied to clipboard

mvcc: database space exceeded.

Open topicgit opened this issue 6 years ago • 12 comments

Please answer these questions before submitting your issue. Thanks! 在你提交 issue 前,请先回答以下问题,谢谢!

  1. What version of Go and cronsun version are you using? 你用的是哪个版本的 Go 和 哪个版本的 cronsun? 下载的release包, cronsun-v0.3.5版本

  2. What operating system and processor architecture are you using (go env)? 你用的是哪个操作系统,什么架构的? Centos7 , 三个 etcd节点 三个mongo节点副本集

  3. What did you do? If possible, provide a recipe for reproducing the error. A complete runnable program is good. 你做了什么,遇到了什么问题?尽可能描述清楚问题,最好把操作步骤写下来,按这些步骤操作后能重现你的问题。

  4. 部分任务总是无缘无故不执行,重启也没有用,有时候增加了还是会自己就不执行。 image

  5. cronsun-web连接etcd总是报错, 2019-04-25T14:13:48.515+0800 ERROR web/base.go:87 Failed to store session: etcd create new lease faild: etcdserver: mvcc: database space exceeded. github.com/shunfei/cronsun/log.Errorf /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/log/log.go:39 github.com/shunfei/cronsun/web.authHandler.func1.1 /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:87 github.com/shunfei/cronsun/web.(*Context).Done /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:56 github.com/shunfei/cronsun/web.BaseHandler.ServeHTTP /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:151 github.com/gorilla/mux.(*Router).ServeHTTP /Users/qleelulu/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:162 net/http.serverHandler.ServeHTTP /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 net/http.(*conn).serve /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847

我的etcd启动已经增加了 --auto-compaction-retention=1 参数

  1. What did you expect to see? 你期望得到什么样的结果? 查出我以上两个问题的根本原因
  2. What did you see instead? 现在你得到的结果是什么样的?

topicgit avatar Apr 25 '19 06:04 topicgit

这个是 etcd 集群的问题,这种情况 etcd 拒绝写入新数据了。 优先看看各个节点中留给 etcd 的可用空间是否充足,也可以参考 limit.md#storage-size-limit 尝试加大存储配额。

官方解决方法参考: what does mvcc database space exceeded mean and how do i fix it.

Doflatango avatar Apr 25 '19 07:04 Doflatango

--auto-compaction-retention=1 我的etcd启动的时候已经增加了这个参数

topicgit avatar Apr 25 '19 13:04 topicgit

请问这个平台总计能支持添加多少个运行任务,因为我这有31个任务失败,我删除后重新创建好了,但是没几个小时就又坏了另外31个任务,如何排查,感谢

topicgit avatar Apr 25 '19 21:04 topicgit

@Doflatango @QLeelulu 我在node节点发现如下日志,可能是导致这些任务不执行的原因: 2019-04-26T05:26:47.004+0800 INFO cronsun/job.go:254 job[/cronsun/cmd/xx-抓取-源-xx/4f4f11e2] didn't get a lock, err: etcdserver: mvcc: database space exceeded 2019-04-26T05:26:48.004+0800 INFO cronsun/job.go:254 job[/cronsun/cmd/xx-xxx-spider/3d4f11e2] didn't get a lock, err: etcdserver: mvcc: database space exceeded 请问这个应该如何解决,etcd的库现在已经可以正常的读写,麻烦了

topicgit avatar Apr 25 '19 21:04 topicgit

--auto-compaction-retention=1 我的etcd启动的时候已经增加了这个参数

看看 etcd 各节点存储的情况,如果你的 etcd 已经存在大量的 key 并且都很活跃的话,这个参数不一定起作用。 另外你的集群是新的?有其他应用在使用吗?

请问这个平台总计能支持添加多少个运行任务,因为我这有31个任务失败,我删除后重新创建好了,但是没几个小时就又坏了另外31个任务,如何排查,感谢

任务坏了是指?我估计是同个问题引起的,每个任务数据大小都差不多,你的 etcd 集群数据在某个节点达到存储上限了,etcd 会拒绝写入新数据的。你想想是不是有什么地方被你忽略掉了?

Doflatango avatar Apr 26 '19 01:04 Doflatango

不知道你的 etcd 集群启动配置是怎么样的?尝试加上这个

--max-snapshots=0 --max-wals=0

https://github.com/etcd-io/etcd/issues/10312#issuecomment-446167072

Doflatango avatar Apr 26 '19 02:04 Doflatango

[program: etcd] command=/opt/etcd/etcd --name node1 --initial-advertise-peer-urls http://172.30.118.166:2380 --listen-peer-urls http://172.30.118.166:2380 --listen-client-urls http://172.30.118.166:2379,http://127.0.0.1:2379 --advertise-client-urls http://172.30.118.166:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster node0=http://172.30.181.162:2380,node1=http://172.30.118.166:2380,node2=http://172.30.61.120:2380 --initial-cluster-state new --data-dir=/opt/data/etcd --auto-compaction-retention=1 user=work directory=/opt/data/etcd autostart=true autorestart=true stopwaitsecs=10 startretries=3 redirect_stderr = true stdout_logfile=/opt/logs/etcd/etcd.log stdout_logfile_maxbytes=100MB

@Doflatango 多谢,这个是我的etcd配置,我更换了etcd数据落到ssd盘,我观察看看

topicgit avatar Apr 26 '19 07:04 topicgit

请问如何打开node的debug模式,我们这的任务莫名其妙的不运行了, 就是没有实际执行,mongo里面也没有log,显示下个调度时间是对的? 确认 etcd和mongo没有问题 @Doflatango @QLeelulu 麻烦大佬帮给点思路

topicgit avatar Apr 29 '19 12:04 topicgit

ERROR web/job.go:263 GetJobLatestLogListByJobIds error: Executor error during find command :: caused by :: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

报错是这个是哪里设置导致?

gitzhuyongfu avatar Jul 19 '19 02:07 gitzhuyongfu

ERROR web/job.go:263 GetJobLatestLogListByJobIds error: Executor error during find command :: caused by :: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

报错是这个是哪里设置导致?

我这里也碰到了类似问题,看了日志和代码发现是 mongodb 在排序数据时内存超过默认的32M限制了。 可以手动给mongodb collection 加上索引解决:

use cronsun;
db.job_log.ensureIndex({"beginTime":1});
db.job_latest_log.ensureIndex({"beginTime":1})

kong36088 avatar Nov 12 '19 11:11 kong36088

@topicgit 参考这里 https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#history-compaction 我用的是这两个参数.

--auto-compaction-mode=revision, --auto-compaction-retention=1000

你的集群完蛋了, 可以执行 etcdctl compactetcdctl defrag , 调整好参数再启动. https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#defragmentation

偶尔关注下 etcd 集群状态.

$ etcdctl endpoint status --cluster --write-out=table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://xxx.xx.xxx.xxx:2379 | 6af95b102177a723 |  3.3.12 |  1.0 MB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | 7decea3fa3a82bf2 |  3.3.12 |  799 kB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | a8baaf934db2abcd |  3.3.12 |  799 kB |      true |       165 |   47445483 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

cupen avatar Sep 15 '20 07:09 cupen

@topicgit 参考这里 https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#history-compaction 我用的是这两个参数.

--auto-compaction-mode=revision, --auto-compaction-retention=1000

你的集群完蛋了, 可以执行 etcdctl compactetcdctl defrag , 调整好参数再启动. https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#defragmentation

偶尔关注下 etcd 集群状态.

$ etcdctl endpoint status --cluster --write-out=table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://xxx.xx.xxx.xxx:2379 | 6af95b102177a723 |  3.3.12 |  1.0 MB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | 7decea3fa3a82bf2 |  3.3.12 |  799 kB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | a8baaf934db2abcd |  3.3.12 |  799 kB |      true |       165 |   47445483 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

多谢,我试试看

topicgit avatar Sep 28 '20 03:09 topicgit