storj icon indicating copy to clipboard operation
storj copied to clipboard

used-space-filewalker doesn't update the usage correctly after restart

Open AlexeyALeonov opened this issue 1 year ago • 9 comments

Description After restart of storagenode, the used-space-filewalker doesn't update the usage since v1.115.4

See https://forum.storj.io/t/used-space-gets-reset-after-restart/28408?u=alexey

Steps to reproduce the issue:

  1. used storj-up environment for 1.115.4 or later
  2. upload several 128MiB files with TTL +2m (or any)
  3. run docker compose exec storagenode1 storagenode dashboard --address=:30002 to see a dashboard
  4. Wait for the TTL collector chore to complete
  5. Check the dashboard on the next day or two - it should show some non-zero value (perhaps the TTL collector doesn't update the usage again or not in all databases - we now have an additional database used_space_per_prefix.db)
  6. upload several 128MiB files with TTL
  7. Make sure, that the usage value on the dashboard has changed
  8. Restart the node:
docker compose exec storagenode1 rm /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
  1. The value would be the same, which was before uploading additional 128MiB files.
  2. after the used-space-filewalker has completed the scan, the value will not change.

Logs:

$ docker compose exec storagenode1 storagenode dashboard --address=:30002

Storage Node Dashboard ( Node Version: v0.0.0 )

======================

ID     12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 1m49s

                   Available         Used     Egress     Ingress
     Bandwidth           N/A          0 B        0 B         0 B (since Nov 1)
          Disk       0.93 GB     67.26 MB
Internal :30002
External storagenode1:30001
$ docker compose exec storagenode1 du --si -d 1 /var/lib/storj/.local/share/storj/storagenode/storage/
8.2k    /var/lib/storj/.local/share/storj/storagenode/storage/piece_expirations
8.2k    /var/lib/storj/.local/share/storj/storagenode/storage/trash
4.1k    /var/lib/storj/.local/share/storj/storagenode/storage/temp
46k     /var/lib/storj/.local/share/storj/storagenode/storage/blobs
1.7M    /var/lib/storj/.local/share/storj/storagenode/storage/
$ docker compose exec storagenode1 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
$ docker compose logs storagenode1 | grep "\sused-space" | tail
storagenode1-1  | 2024-11-24T02:56:57Z  INFO    pieces  pieces/store.go:738     used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1  | 2024-11-24T02:56:57Z  INFO    pieces  pieces/store.go:761     used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "890.394µs"}
storagenode1-1  | 2024-11-24T02:57:00Z  INFO    pieces  pieces/store.go:738     used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1  | 2024-11-24T02:57:00Z  INFO    pieces  pieces/store.go:761     used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "701.095µs"}
storagenode1-1  | 2024-11-24T02:57:03Z  INFO    pieces  pieces/store.go:738     used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1  | 2024-11-24T02:57:03Z  INFO    pieces  pieces/store.go:761     used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "682.196µs"}
storagenode1-1  | 2024-11-24T02:57:06Z  INFO    pieces  pieces/store.go:738     used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1  | 2024-11-24T02:57:06Z  INFO    pieces  pieces/store.go:761     used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "697.895µs"}
storagenode1-1  | 2024-11-24T02:57:09Z  INFO    pieces  pieces/store.go:738     used-space-filewalker started   {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4"}
storagenode1-1  | 2024-11-24T02:57:09Z  INFO    pieces  pieces/store.go:761     used-space-filewalker completed {"Process": "storagenode", "Satellite ID": "12whfK1EDvHJtajBiAUeajQLYcWqxcQmdYQU5zX5cCf6bAxfgu4", "Lazy File Walker": false, "Total Pieces Size": 67256320, "Total Pieces Content Size": 67254272, "Total Pieces Count": 4, "Duration": "4.243171ms"}
$ docker compose exec storagenode1 storagenode dashboard --address=:30002

Storage Node Dashboard ( Node Version: v0.0.0 )

======================

ID     12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 18s

                   Available         Used     Egress     Ingress
     Bandwidth           N/A          0 B        0 B         0 B (since Nov 1)
          Disk       0.93 GB     67.26 MB
Internal :30002
External storagenode1:30001
$ docker compose exec storagenode1 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/used_space_per_prefix.db /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode1
$ docker compose exec storagenode1 storagenode dashboard --address=:30002

Storage Node Dashboard ( Node Version: v0.0.0 )

======================

ID     12FjPhvS8uEZhCLGXLergiC5WzVeRe6Uqh6HFN1uw86834kK2sV
Status ONLINE
Uptime 9s

                   Available     Used     Egress     Ingress
     Bandwidth           N/A      0 B        0 B         0 B (since Nov 1)
          Disk       1.00 GB      0 B
Internal :30002
External storagenode1:30001

Describe the results you expected: The used-space-filewalker should calculate the used space correctly

Describe the results you received: The used-space-filewalker was finish in seconds, the used space will be invalid.

Possible Fix

Remove the used_space_per_prefix.db database and restart the node, the used space value will be updated correctly.

Your environment

  • Operating system and version: any
  • Additional environment details (Raspberry PI, Docker, VMWare, etc.): any

AlexeyALeonov avatar Nov 23 '24 07:11 AlexeyALeonov

This issue has been mentioned on Storj Community Forum (official). There might be relevant details there:

https://forum.storj.io/t/used-space-gets-reset-after-restart/28408/61

storjrobot avatar Nov 23 '24 07:11 storjrobot

v1.115.5, this node with the enabled badger cache and disabled lazy, no additional files have been uploaded, just the next day:

Storage Node Dashboard ( Node Version: v0.0.0 )

======================

ID     1pu641mthtErUWNBJCwSCv88sDBMGk9Z6yL5NATj7XWEpux9Rt
Status ONLINE
Uptime 20h59m37s

                   Available        Used     Egress     Ingress
     Bandwidth           N/A         0 B        0 B         0 B (since Nov 1)
          Disk       1.00 GB     1.02 KB
Internal :30022
External storagenode3:30021
^C2024-11-27T03:17:18Z  INFO    process/exec_conf.go:140        Got a signal from the OS: "interrupt"   {"Process": "storagenode"}
2024-11-27T03:17:19Z    DEBUG   process/exec_conf.go:425        Unrecoverable error     {"Process": "storagenode", "error": "context canceled"}
Error: context canceled
ubuntu@storj-up:~/build$ docker compose exec storagenode3 du --si -d 1 /var/lib/storj/.local/share/storj/storagenode/storage/
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
8.2k    /var/lib/storj/.local/share/storj/storagenode/storage/piece_expirations
8.2k    /var/lib/storj/.local/share/storj/storagenode/storage/trash
4.1k    /var/lib/storj/.local/share/storj/storagenode/storage/temp
33k     /var/lib/storj/.local/share/storj/storagenode/storage/filestatcache
25k     /var/lib/storj/.local/share/storj/storagenode/storage/blobs
771k    /var/lib/storj/.local/share/storj/storagenode/storage/
$ docker compose exec storagenode3 rm -f /var/lib/storj/.local/share/storj/storagenode/storage/trash/.trash-uses-day-dirs-indicator && docker compose restart storagenode3
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
WARN[0000] /home/ubuntu/build/docker-compose.yaml: the attribute `version` is obsolete, it will be ignored, please remove it to avoid potential confusion
[+] Restarting 1/1
 ✔ Container build-storagenode3-1  Started  
$ docker compose exec storagenode3 storagenode dashboard --address=:30022

Storage Node Dashboard ( Node Version: v0.0.0 )

======================

ID     1pu641mthtErUWNBJCwSCv88sDBMGk9Z6yL5NATj7XWEpux9Rt
Status ONLINE
Uptime 46s

                   Available         Used     Egress     Ingress
     Bandwidth           N/A          0 B        0 B         0 B (since Nov 1)
          Disk       0.97 GB     33.63 MB
Internal :30022
External storagenode3:30021

AlexeyALeonov avatar Nov 27 '24 03:11 AlexeyALeonov

Any updates? Same behaviours with 1.116.7

foegra avatar Dec 10 '24 20:12 foegra

Same behaviours with v1.117.8

surfercool avatar Dec 20 '24 11:12 surfercool

Any updates? I've tested the last 2 releases and the issue is still there in 1.116.7 and 1.115.5

Juberstine avatar Dec 23 '24 02:12 Juberstine

Hi, is this bug still there? If so, can i work on it as an open src contributor? pls advise

nmarasoiu avatar Feb 26 '25 15:02 nmarasoiu

Yes, it's still here. I believe you can work on it.

AlexeyALeonov avatar Feb 26 '25 16:02 AlexeyALeonov

Do we know which git commit introduced it? I saw a big refactoring about doing some incremental snapshot/freeze/continue on node reboot with the work incomplete, so as to continue the next bootup with counting and categorizing to trash vs stored, so that is my current suspicion, but pls confirm, i will check..

nmarasoiu avatar Feb 27 '25 12:02 nmarasoiu

What about this part of the code? https://github.com/storj/storj/blob/main/storagenode/pieces/filewalker.go#L93 I've seen the used_space_per_prefix.db got introduced with the resume feature . Does the space get updated with new pieces or GC also or is this DB filewalker exclusive? That could explain a restart uses old, up to 7 Days old, used space.

revyte avatar Jun 11 '25 23:06 revyte