cronsun mvcc: database space exceeded.

Please answer these questions before submitting your issue. Thanks! 在你提交 issue 前，请先回答以下问题，谢谢！

What version of Go and cronsun version are you using? 你用的是哪个版本的 Go 和哪个版本的 cronsun？下载的release包， cronsun-v0.3.5版本
What operating system and processor architecture are you using (go env)? 你用的是哪个操作系统，什么架构的？ Centos7 ，三个 etcd节点三个mongo节点副本集
What did you do? If possible, provide a recipe for reproducing the error. A complete runnable program is good. 你做了什么，遇到了什么问题？尽可能描述清楚问题，最好把操作步骤写下来，按这些步骤操作后能重现你的问题。
部分任务总是无缘无故不执行，重启也没有用，有时候增加了还是会自己就不执行。
cronsun-web连接etcd总是报错， 2019-04-25T14:13:48.515+0800 ERROR web/base.go:87 Failed to store session: etcd create new lease faild: etcdserver: mvcc: database space exceeded. github.com/shunfei/cronsun/log.Errorf /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/log/log.go:39 github.com/shunfei/cronsun/web.authHandler.func1.1 /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:87 github.com/shunfei/cronsun/web.(*Context).Done /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:56 github.com/shunfei/cronsun/web.BaseHandler.ServeHTTP /Users/qleelulu/dev/source/golang/src/github.com/shunfei/cronsun/web/base.go:151 github.com/gorilla/mux.(*Router).ServeHTTP /Users/qleelulu/go/pkg/mod/github.com/gorilla/[email protected]/mux.go:162 net/http.serverHandler.ServeHTTP /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:2741 net/http.(*conn).serve /usr/local/Cellar/go/1.11/libexec/src/net/http/server.go:1847

我的etcd启动已经增加了 --auto-compaction-retention=1 参数

What did you expect to see? 你期望得到什么样的结果？查出我以上两个问题的根本原因
What did you see instead? 现在你得到的结果是什么样的？

Apr 25 '19 06:04 topicgit

这个是 etcd 集群的问题，这种情况 etcd 拒绝写入新数据了。优先看看各个节点中留给 etcd 的可用空间是否充足，也可以参考 limit.md#storage-size-limit 尝试加大存储配额。

官方解决方法参考： what does mvcc database space exceeded mean and how do i fix it.

Apr 25 '19 07:04 Doflatango

--auto-compaction-retention=1 我的etcd启动的时候已经增加了这个参数

Apr 25 '19 13:04 topicgit

请问这个平台总计能支持添加多少个运行任务，因为我这有31个任务失败，我删除后重新创建好了，但是没几个小时就又坏了另外31个任务，如何排查，感谢

Apr 25 '19 21:04 topicgit

@Doflatango @QLeelulu 我在node节点发现如下日志，可能是导致这些任务不执行的原因： 2019-04-26T05:26:47.004+0800 INFO cronsun/job.go:254 job[/cronsun/cmd/xx-抓取-源-xx/4f4f11e2] didn't get a lock, err: etcdserver: mvcc: database space exceeded 2019-04-26T05:26:48.004+0800 INFO cronsun/job.go:254 job[/cronsun/cmd/xx-xxx-spider/3d4f11e2] didn't get a lock, err: etcdserver: mvcc: database space exceeded 请问这个应该如何解决，etcd的库现在已经可以正常的读写，麻烦了

Apr 25 '19 21:04 topicgit

--auto-compaction-retention=1 我的etcd启动的时候已经增加了这个参数

看看 etcd 各节点存储的情况，如果你的 etcd 已经存在大量的 key 并且都很活跃的话，这个参数不一定起作用。另外你的集群是新的？有其他应用在使用吗？

请问这个平台总计能支持添加多少个运行任务，因为我这有31个任务失败，我删除后重新创建好了，但是没几个小时就又坏了另外31个任务，如何排查，感谢

任务坏了是指？我估计是同个问题引起的，每个任务数据大小都差不多，你的 etcd 集群数据在某个节点达到存储上限了，etcd 会拒绝写入新数据的。你想想是不是有什么地方被你忽略掉了？

Apr 26 '19 01:04 Doflatango

不知道你的 etcd 集群启动配置是怎么样的？尝试加上这个

--max-snapshots=0 --max-wals=0

https://github.com/etcd-io/etcd/issues/10312#issuecomment-446167072

Apr 26 '19 02:04 Doflatango

[program: etcd] command=/opt/etcd/etcd --name node1 --initial-advertise-peer-urls http://172.30.118.166:2380 --listen-peer-urls http://172.30.118.166:2380 --listen-client-urls http://172.30.118.166:2379,http://127.0.0.1:2379 --advertise-client-urls http://172.30.118.166:2379 --initial-cluster-token etcd-cluster-1 --initial-cluster node0=http://172.30.181.162:2380,node1=http://172.30.118.166:2380,node2=http://172.30.61.120:2380 --initial-cluster-state new --data-dir=/opt/data/etcd --auto-compaction-retention=1 user=work directory=/opt/data/etcd autostart=true autorestart=true stopwaitsecs=10 startretries=3 redirect_stderr = true stdout_logfile=/opt/logs/etcd/etcd.log stdout_logfile_maxbytes=100MB

@Doflatango 多谢，这个是我的etcd配置，我更换了etcd数据落到ssd盘，我观察看看

Apr 26 '19 07:04 topicgit

请问如何打开node的debug模式，我们这的任务莫名其妙的不运行了，就是没有实际执行，mongo里面也没有log，显示下个调度时间是对的？确认 etcd和mongo没有问题 @Doflatango @QLeelulu 麻烦大佬帮给点思路

Apr 29 '19 12:04 topicgit

ERROR web/job.go:263 GetJobLatestLogListByJobIds error: Executor error during find command :: caused by :: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

报错是这个是哪里设置导致？

Jul 19 '19 02:07 gitzhuyongfu

ERROR web/job.go:263 GetJobLatestLogListByJobIds error: Executor error during find command :: caused by :: errmsg: "Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit

报错是这个是哪里设置导致？

我这里也碰到了类似问题，看了日志和代码发现是 mongodb 在排序数据时内存超过默认的32M限制了。可以手动给mongodb collection 加上索引解决：

use cronsun;
db.job_log.ensureIndex({"beginTime":1});
db.job_latest_log.ensureIndex({"beginTime":1})

Nov 12 '19 11:11 kong36088

@topicgit 参考这里 https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#history-compaction 我用的是这两个参数.

--auto-compaction-mode=revision, --auto-compaction-retention=1000

你的集群完蛋了, 可以执行 etcdctl compact 和 etcdctl defrag , 调整好参数再启动. https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#defragmentation

偶尔关注下 etcd 集群状态.

$ etcdctl endpoint status --cluster --write-out=table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://xxx.xx.xxx.xxx:2379 | 6af95b102177a723 |  3.3.12 |  1.0 MB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | 7decea3fa3a82bf2 |  3.3.12 |  799 kB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | a8baaf934db2abcd |  3.3.12 |  799 kB |      true |       165 |   47445483 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

Sep 15 '20 07:09 cupen

@topicgit 参考这里 https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#history-compaction 我用的是这两个参数.

--auto-compaction-mode=revision, --auto-compaction-retention=1000

你的集群完蛋了, 可以执行 etcdctl compact 和 etcdctl defrag , 调整好参数再启动. https://github.com/etcd-io/etcd/blob/a621d807f061e1dd635033a8d6bc261461429e27/Documentation/op-guide/maintenance.md#defragmentation

偶尔关注下 etcd 集群状态.
$ etcdctl endpoint status --cluster --write-out=table
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
|          ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+
| http://xxx.xx.xxx.xxx:2379 | 6af95b102177a723 |  3.3.12 |  1.0 MB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | 7decea3fa3a82bf2 |  3.3.12 |  799 kB |     false |       165 |   47445483 |
| http://xxx.xx.xxx.xxx:2379 | a8baaf934db2abcd |  3.3.12 |  799 kB |      true |       165 |   47445483 |
+----------------------------+------------------+---------+---------+-----------+-----------+------------+

多谢，我试试看

Sep 28 '20 03:09 topicgit

cronsun cronsun copied to clipboard

mvcc: database space exceeded.

cronsun
cronsun copied to clipboard