used-space-filewalker doesn't update the usage correctly after restart
Description
After restart of storagenode, the used-space-filewalker doesn't update the usage since v1.115.4
See https://forum.storj.io/t/used-space-gets-reset-after-restart/28408?u=alexey
Steps to reproduce the issue:
- used
storj-upenvironment for 1.115.4 or later - upload several 128MiB files with TTL +2m (or any)
- run
docker compose exec storagenode1 storagenode dashboard --address=:30002to see a dashboard - Wait for the TTL collector chore to complete
- Check the dashboard on the next day or two - it should show some non-zero value (perhaps the TTL collector doesn't update the usage again or not in all databases - we now have an additional database
used_space_per_prefix.db) - upload several 128MiB files with TTL
- Make sure, that the usage value on the dashboard has changed
- Restart the node:
docker compose exec storagenode1 rm /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
- The value would be the same, which was before uploading additional 128MiB files.
- after the used-space-filewalker has completed the scan, the value will not change.
Logs:
$ docker compose exec storagenode1 storagenode dashboard --address=:30002
Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID 12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 1m49s
Available Used Egress Ingress
Bandwidth N/A 0 B 0 B 0 B (since Nov 1)
Disk 0.93 GB 67.26 MB
Internal :30002
External storagenode1:30001
$ docker compose exec storagenode1 du --si -d 1 /var/lib/storj/.local/share/storj/storagenode/storage/
8.2k /var/lib/storj/.local/share/storj/storagenode/storage/piece_expirations
8.2k /var/lib/storj/.local/share/storj/storagenode/storage/trash
4.1k /var/lib/storj/.local/share/storj/storagenode/storage/temp
46k /var/lib/storj/.local/share/storj/storagenode/storage/blobs
1.7M /var/lib/storj/.local/share/storj/storagenode/storage/
$ docker compose exec storagenode1 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
$ docker compose logs storagenode1 | grep "\sused-space" | tail
storagenode1-1 | 2024-11-24T02:56:57Z INFO pieces pieces/store.go:738 used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1 | 2024-11-24T02:56:57Z INFO pieces pieces/store.go:761 used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "890.394µs"}
storagenode1-1 | 2024-11-24T02:57:00Z INFO pieces pieces/store.go:738 used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1 | 2024-11-24T02:57:00Z INFO pieces pieces/store.go:761 used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "701.095µs"}
storagenode1-1 | 2024-11-24T02:57:03Z INFO pieces pieces/store.go:738 used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1 | 2024-11-24T02:57:03Z INFO pieces pieces/store.go:761 used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "682.196µs"}
storagenode1-1 | 2024-11-24T02:57:06Z INFO pieces pieces/store.go:738 used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1 | 2024-11-24T02:57:06Z INFO pieces pieces/store.go:761 used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "697.895µs"}
storagenode1-1 | 2024-11-24T02:57:09Z INFO pieces pieces/store.go:738 used-space-filewalker started {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1 | 2024-11-24T02:57:09Z INFO pieces pieces/store.go:761 used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "4.243171ms"}
$ docker compose exec storagenode1 storagenode dashboard --address=:30002
Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID 12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 18s
Available Used Egress Ingress
Bandwidth N/A 0 B 0 B 0 B (since Nov 1)
Disk 0.93 GB 67.26 MB
Internal :30002
External storagenode1:30001
$ docker compose exec storagenode1 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/used_space_per_prefix.db /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
$ docker compose exec storagenode1 storagenode dashboard --address=:30002
Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID 12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 9s
Available Used Egress Ingress
Bandwidth N/A 0 B 0 B 0 B (since Nov 1)
Disk 1.00 GB 0 B
Internal :30002
External storagenode1:30001
Describe the results you expected:
The used-space-filewalker should calculate the used space correctly
Describe the results you received:
The used-space-filewalker was finish in seconds, the used space will be invalid.
Possible Fix
Remove the used_space_per_prefix.db database and restart the node, the used space value will be updated correctly.
Your environment
- Operating system and version: any
- Additional environment details (Raspberry PI, Docker, VMWare, etc.): any
This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:
https://forum.storj.io/t/used-space-gets-reset-after-restart/28408/61
v1.115.5, this node with the enabled badger cache and disabled lazy, no additional files have been uploaded, just the next day:
Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID 1pu641mthtErUWNBJCwSCv88sDBMGk9Z6yL5NATj7XWEpux9Rt
Status ONLINE
Uptime 20h59m37s
Available Used Egress Ingress
Bandwidth N/A 0 B 0 B 0 B (since Nov 1)
Disk 1.00 GB 1.02 KB
Internal :30022
External storagenode3:30021
^C2024-11-27T03:17:18Z INFO process/exec_conf.go:140 Got a signal from the OS: "interrupt" {"Process": "storagenode"}
2024-11-27T03:17:19Z DEBUG process/exec_conf.go:425 Unrecoverable error {"Process": "storagenode", "error": "context canceled"}
Error: context canceled
ubuntu@storj-up:~/build$ docker compose exec storagenode3 du --si -d 1 /var/lib/storj/.local/share/storj/storagenode/storage/
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
8.2k /var/lib/storj/.local/share/storj/storagenode/storage/piece_expirations
8.2k /var/lib/storj/.local/share/storj/storagenode/storage/trash
4.1k /var/lib/storj/.local/share/storj/storagenode/storage/temp
33k /var/lib/storj/.local/share/storj/storagenode/storage/filestatcache
25k /var/lib/storj/.local/share/storj/storagenode/storage/blobs
771k /var/lib/storj/.local/share/storj/storagenode/storage/
$ docker compose exec storagenode3 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode3
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
[+] Restarting 1/1
✔ Container build-storagenode3-1 Started
$ docker compose exec storagenode3 storagenode dashboard --address=:30022
Storage Node Dashboard ( Node Version: v0.0.0 )
======================
ID 1pu641mthtErUWNBJCwSCv88sDBMGk9Z6yL5NATj7XWEpux9Rt
Status ONLINE
Uptime 46s
Available Used Egress Ingress
Bandwidth N/A 0 B 0 B 0 B (since Nov 1)
Disk 0.97 GB 33.63 MB
Internal :30022
External storagenode3:30021
Any updates? Same behaviours with 1.116.7
Same behaviours with v1.117.8
Any updates? I've tested the last 2 releases and the issue is still there in 1.116.7 and 1.115.5
Hi, is this bug still there? If so, can i work on it as an open src contributor? pls advise
Yes, it's still here. I believe you can work on it.
Do we know which git commit introduced it? I saw a big refactoring about doing some incremental snapshot/freeze/continue on node reboot with the work incomplete, so as to continue the next bootup with counting and categorizing to trash vs stored, so that is my current suspicion, but pls confirm, i will check..
What about this part of the code? https://github.com/storj/storj/blob/main/storagenode/pieces/filewalker.go#L93 I've seen the used_space_per_prefix.db got introduced with the resume feature . Does the space get updated with new pieces or GC also or is this DB filewalker exclusive? That could explain a restart uses old, up to 7 Days old, used space.