gaia
gaia copied to clipboard
High Disk Usage and Sync Issues with gaiad v14.1.0
Summary of Bug
After upgrading to gaiad version 14.1.0, I've encountered two significant issues:
- Excessive Disk Write: The gaiad process is writing approximately 20GB of data to the data directory daily, which seems unusually high compared to previous versions.
- Intermittent Sync Delays: Every few hours, the synchronization process slows down dramatically, nearly halting, before eventually continuing. This erratic behavior was not observed in earlier versions.
These issues have only arisen following the recent upgrade to version 14.
Version
I am currently running gaiad version v14.1.0.
Steps to Reproduce
- Followed the official Cosmos Quickstart Guide to set up gaiad.
- Ran an upgrade from a previous version to v14.1.0.
- Noticed the issues shortly after the upgrade was completed and the node began normal operations.
Environment:
- Instance Type: AWS EC2
- Specifications: 4 vCPUs, 32GB RAM
- Resource Utilization: ~50% CPU and 15GB RAM usage
- Additional Context: This is an upgrade from an older version of gaiad, and I have not installed gaiacli as per the new recommendations.
Expected vs. Actual Behavior:
- Expected: Normal disk write operations and consistent syncing performance as experienced in previous versions.
- Actual: Unusually high disk write activity (~20GB/day) and periodic, significant sync slowdowns.
For Admin Use
- [ ] Not duplicate issue
- [ ] Appropriate labels applied
- [ ] Appropriate contributors tagged
- [ ] Contributor assigned/self-assigned
- [ ] Is a spike necessary to map out how the issue should be approached?
Thanks @ronigk8io for reporting, we'll check with our validator team to see if they're experiencing the same issues.
EDIT the validator team came back and said that they're not seeing missing blocks for at least a couple of weeks. We did have a syncing problem related to proposal vote counting, not sure if that issue is fixed, but it lies with the SDK version that we're using. It predates v14 by a long way also. Regarding storage, so the team came back asking about the pruning level that you've set, they get ~6gb/day with pruning interval at 100. Can you provide more details wrt the pruning and also, if possible, when the slow downs in synching occured. I can check whether it was proposal related or if other validators had the same issue at the same time.
@mmulji-ic I can confirm, sync speed is down on my nodes as well. I use the Dockerised version of the node, all AWS, r5.xlarge. Also, the node dropped the instance twice due to high disk usage, and it takes insanely long to sync it back.
Thanks @fmira21 will follow up with the comet team on this on.
v14.x is no longer active on mainnet.
The network migrated to v15.x which used comet v0.37.x and cosmos-sdk v47. Earlier this week, the network upgraded to v16.x.
This issue may no longer be relevant and the discussion seems to have decayed.
cosmos-sdk v0.47.x and comet v0.37.x brought many improvements and there were no recent reports of high disk usage.
Please feel free to reopen if this happens again.