bee
bee copied to clipboard
Running out of disk space
Summary
We have a bunch of troubles running our bee node serving the swarm downloader (out hackathon project).
- We a running a bee node under https://swarm.dapplets.org
- the node takes all available space on HDD and obviously starts rejecting files we are uploading. Waiting for swarm hash either fails immediately or takes too long time. We have set a db-capacity: 2621440 chunks (aprox. 10gb) + 5GB freespace, but goes fully consumed.
Steps to reproduce
- Created VPS server in Hetzner with following hardware (CX11, 1 VCPU, 2 GB RAM, 20 GB) with Ubuntu 20.04.2 LTS
- Installed Bee via
wget https://github.com/ethersphere/bee/releases/download/v0.5.0/bee_0.5.0_amd64.deb sudo dpkg -i bee_0.5.0_amd64.deb
- Configured like in the config bellow
- Installed nginx web-server and configured reverse proxy from https://swarm.dapplets.org to http://localhost:1633 with SSL of let's encrypt
- Upload files to the node via POST https://swarm.dapplets.org/files/
- After a while disk space runs out
Expected behavior
I expect to see 5gb freespace :)
Actual behavior
- Disk space runs out
- in the log a lot of errors about it
- cannot upload a file, node responses HTTP 500 internal server error
Config /etc/bee/bee.yaml
Uncommented lines from config file:
api-addr: 127.0.0.1:1633
clef-signer-endpoint: /var/lib/bee-clef/clef.ipc
config: /etc/bee/bee.yaml
data-dir: /var/lib/bee
db-capacity: 2621440
gateway-mode: true
password-file: /var/lib/bee/password
swap-enable: true
swap-endpoint: https://rpc.slock.it/goerli
Thank you for reporting the bug! We will have a look into it shortly
Tangential suggestion: It should be pretty easy to calculate an approximate estimate of the maximum possible disk usage from db-capacity
and compare that to the actual disk space available. If these are grossly out-of-whack then bee should log a suggestion of a new value for db-capacity
that will not exceed the available disk space.
I can reproduce this problem. It looks like disk space accounting does not include uploaded files. When I restart swarm, immediately a ton of disk space is freed up as db-capacity
is re-applied.
Hm, bee isn't releasing all of the disk space even after a restart,
root@salvia /o/bee# grep db-cap /etc/bee/bee.yaml
db-capacity: 5000000
root@salvia /o/bee# ls
keys/ localstore/ password statestore/
root@salvia /o/bee# du -h -s .
111G .
- Can you say if at any point your db capacity was set above
5mil
? - Did you play around with the size? Bee will not garbage collect your uploaded content before it is fully synced. You can track progress of your uploads with the tags API.
Please try to give as much information as to what you have done prior to this problem surfacing. I'm trying to reproduce this but so far no luck.
Can you say if at any point your db capacity was set above 5mil?
Yes, I tried 10mil. Once I realized that disk space management wasn't working then I reduced back to 5mil.
Did you play around with the size?
On one node, I probably uploaded faster than sync'ing. For example, maybe I uploaded 30G of data to the node very quickly and then waited for it to sync.
I'm trying to reproduce this but so far no luck.
If you can provide some guidance about how to not trigger the issue then that would also help. I gather that I shouldn't mess with the db-capacity setting. Also, I should not uploaded too fast?
I was trying to find where the limits were, to help with testing, but I am content to play within expected user behavior too.
I'm curious to hear from @alsakhaev too
@Eknir @acud
message from bee-support
mfw78: I've found on 3x containers that I've run, all of them do not respect the db-capacity limit.
sig: are you uploading any data to them?
mfw78: No
+1: started a node on raspi with 32gb sd card, ran out of disk space after 10hrs
+1: have set up docker-based nodes and all of their localstores have easily surpassed the db-capacity limit and use between 30Gb and 40Gb now
+1: Running multiple bees in Kubernetes containers. Each bee exhausts it's disk space allocation (doubling the db capacity has no effect besides chewing more space, and consequently exceeding).
Thanks all, for the comments and reports. We are releasing soon and included several improvements that aim to address this issue. We would greatly appreciate if you could try it out and report back here.
I can confirm that running 0.5.3 the db-capacity
seems to be more respected, with 6 nodes that I'm running doing the following disk usage: 28G / 21G / 28G / 28G / 29G / 27G
This issue can be reliably reproduced with a rPI

@zelig @acud you guys are working on this as part of the postage stamps? Shall I assign this issue to the current sprint?
@zelig @acud you guys are working on this as part of the postage stamps? Shall I assign this issue to the current sprint?
the bug has a severe impact on the entire network because people are just purging the localstore
of their nodes, causing data loss. No way to release the bee without killing the bug.
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 5 days.
Any news on this? Issue is still there. I'm using default configuration on disk space (BEE_CACHE_CAPACITY=1000000), it should be ~4GB, but this is my disk space graphic.
I didn't perform any upload on node. It's a VERY important issue to fix.
It should be resolved with the latest release. However the problem is multi tiered so shipping a database migration that would fix the problem which is already exacerbated on some nodes was not trivial to do. If you db nuke
your node and allow it to resync, the problem should be resolved.
Any plans to publish guidance on this? In particular, how to detect if the issue exists within a node so that we don't just start nuking everything and dropping retrievability on chunks already stored in the swarm.
I've db nucked two of my nodes, let's see how it will evolve.
@ldeffenb @tmm360 do you still experience this issue?
Disk usage seems stable and not growing. Yesterday I installed bee 1.5.0-dda5606e. The sharky migration finished, but disk consumption doubled. How can I delete the old database and expire the blocks that shouldn't be stored?
I deleted the old database using bee nuke
. Two weeks ago, disk usage was back to zero. As I write, disk usage is back up to 30GiB.
Good amount of traffic today. Disk usage is up to 34.8GiB.
I upgraded to 1.5.1 today. Disk usage is up to 47.9GiB. At this rate, I'll have to nuke my db again in a few weeks.
If you do the following command, substituting the proper IP and debug port, what value is displayed? It should be 2 or 3 on testnet and 8 or 9 on mainnet.
curl http://127.0.0.1:1635/topology | jq .depth
I'm on mainnet. Currently it says 6
Are you sure you have inbound connections open and forwarded to your p2p-addr (default 1634)? With a depth of only 6, it seems that you may not be receiving inbound connections. A shallower depth may cause your node to believe it needs to store more chunks as the neighborhood is larger.