v0.4.0-beta1 upgrade from v0.3.0
During the upgrade, one host now fails to start with the following error.
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]config file read from /etc/litefs.yml
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]LiteFS v0.4.0-beta1, commit=6421a2236f96b1d7bd88d18a1cc46aa1fd5c2ae7
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]no proxy target set, skipping proxy
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]Using Consul to determine primary
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]initializing consul: key=litefs/someapp
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]ERROR: cannot open store: open databases: open database("database"): init from database header: open /mnt/litefs/dbs/database/database: not a directory
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]Starting clean up.
2023-04-14T03:55:31Z app[91857211b0e283] lax [info]Umounting /dev/vdb from /mnt/litefs
2023-04-14T03:55:32Z app[91857211b0e283] lax [info][ 10.188133] reboot: Restarting system
The contents of /mnt/litefs
2 4 drwxr-xr-x 4 root root 4096 Apr 5 08:37 /mnt/litefs
11 16 drwx------ 2 root root 16384 Apr 5 08:36 /mnt/litefs/lost+found
12 4 -rw-r--r-- 1 root root 25 Apr 5 08:37 /mnt/litefs/id
8161 4 drwxr-xr-x 5 root root 4096 Apr 14 03:51 /mnt/litefs/dbs
13 4 drwxr-xr-x 3 root root 4096 Apr 14 03:17 /mnt/litefs/dbs/main
17 32 -rw-r--r-- 1 root root 32768 Apr 14 03:17 /mnt/litefs/dbs/main/shm
14 54432 -rw-r--r-- 1 root root 55738368 Apr 14 03:17 /mnt/litefs/dbs/main/database
15 4 drwxr-xr-x 2 root root 4096 Apr 5 08:37 /mnt/litefs/dbs/main/ltx
16 54488 -rw-r--r-- 1 root root 55792920 Apr 5 08:37 /mnt/litefs/dbs/main/ltx/0000000000000001-0000000000000001.ltx
24 0 -rw-r--r-- 1 root root 0 Apr 14 03:51 /mnt/litefs/dbs/database
18 4 drwxr-xr-x 3 root root 4096 Apr 14 03:17 /mnt/litefs/dbs/main.sqlite
23 0 -rw-r--r-- 1 root root 0 Apr 14 03:52 /mnt/litefs/dbs/main.sqlite/wal
21 32 -rw-r--r-- 1 root root 32768 Apr 14 03:52 /mnt/litefs/dbs/main.sqlite/shm
19 40280 -rw-r--r-- 1 root root 41246720 Apr 14 03:49 /mnt/litefs/dbs/main.sqlite/database
20 4 drwxr-xr-x 2 root root 4096 Apr 14 03:49 /mnt/litefs/dbs/main.sqlite/ltx
26 28 -rw-r--r-- 1 root root 24720 Apr 14 03:49 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003ec-00000000000003ec.ltx
22 8 -rw-r--r-- 1 root root 4220 Apr 14 03:49 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003ea-00000000000003ea.ltx
25 8 -rw-r--r-- 1 root root 4220 Apr 14 03:49 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003eb-00000000000003eb.ltx
27 4 drwxr-xr-x 2 root root 4096 Apr 14 03:51 /mnt/litefs/dbs/ltx
From a system that completed the upgrade successfully:
Connecting to fdaa:0:7bb8:a7b:f4:824c:e412:2... complete
2 4 drwxr-xr-x 4 root root 4096 Apr 2 03:10 /mnt/litefs
12 4 -rw-r--r-- 1 root root 25 Apr 2 03:10 /mnt/litefs/id
8161 4 drwxr-xr-x 4 root root 4096 Apr 2 03:10 /mnt/litefs/dbs
13 4 drwxr-xr-x 3 root root 4096 Apr 14 03:50 /mnt/litefs/dbs/main
14 54432 -rw-r--r-- 1 root root 55738368 Apr 14 03:50 /mnt/litefs/dbs/main/database
17 32 -rw-r--r-- 1 root root 32768 Apr 14 03:50 /mnt/litefs/dbs/main/shm
15 4 drwxr-xr-x 2 root root 4096 Apr 2 03:10 /mnt/litefs/dbs/main/ltx
16 54488 -rw-r--r-- 1 root root 55792920 Apr 2 03:10 /mnt/litefs/dbs/main/ltx/0000000000000001-0000000000000001.ltx
18 4 drwxr-xr-x 3 root root 4096 Apr 14 03:50 /mnt/litefs/dbs/main.sqlite
19 40280 -rw-r--r-- 1 root root 41246720 Apr 14 03:50 /mnt/litefs/dbs/main.sqlite/database
23 80 -rw-r--r-- 1 root root 78312 Apr 14 04:08 /mnt/litefs/dbs/main.sqlite/wal
21 32 -rw-r--r-- 1 root root 32768 Apr 14 04:08 /mnt/litefs/dbs/main.sqlite/shm
20 16 drwxr-xr-x 2 root root 16384 Apr 14 04:08 /mnt/litefs/dbs/main.sqlite/ltx
24 4 -rw-r--r-- 1 root root 2820 Apr 14 04:08 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003f1-00000000000003f1.ltx
22 12 -rw-r--r-- 1 root root 12198 Apr 14 04:07 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003f0-00000000000003f0.ltx
25 4 -rw-r--r-- 1 root root 1751 Apr 14 04:08 /mnt/litefs/dbs/main.sqlite/ltx/00000000000003f2-00000000000003f2.ltx
11 16 drwx------ 2 root root 16384 Apr 2 03:10 /mnt/litefs/lost+found
After "rm /mnt/litefs/dbs/database" it appears to come up cleanly and latest data is there.
2023-04-14T04:23:58Z app[91857211b0e283] lax [info]database file does not exist on initialization: /mnt/litefs/dbs/ltx/database
2023-04-14T04:23:58Z app[91857211b0e283] lax [info]wal-sync: no wal file exists on "main", skipping sync with ltx
2023-04-14T04:23:59Z app[91857211b0e283] lax [info]wal-sync: short wal file exists on "main.sqlite", skipping sync with ltx
2023-04-14T04:23:59Z app[91857211b0e283] lax [info]LiteFS mounted to: /litefs
2023-04-14T04:23:59Z app[91857211b0e283] lax [info]http server listening on: http://localhost:20202
2023-04-14T04:23:59Z app[91857211b0e283] lax [info]waiting to connect to cluster
2023-04-14T04:23:59Z app[91857211b0e283] lax [info]FC1A4EAE16BD02A6: existing primary found (3287440dc92185), connecting as replica
2023-04-14T04:23:59Z app[3287440dc92185] ord [info]2F8633F30B331BF1: stream connected
2023-04-14T04:23:59Z app[3287440dc92185] ord [info]transaction file for txid 00000000000003ed no longer available, writing snapshot
2023-04-14T04:23:59Z app[3287440dc92185] ord [info]writing snapshot "main.sqlite" @ 00000000000003f2
2023-04-14T04:24:00Z app[91857211b0e283] lax [info]snapshot received for "main.sqlite", removing other ltx files: 0000000000000001-00000000000003f2.ltx
2023-04-14T04:24:00Z app[91857211b0e283] lax [info]connected to cluster, ready
2023-04-14T04:24:00Z app[91857211b0e283] lax [info]starting background subprocess: /app/bin/server []
2023-04-14T04:24:00Z app[91857211b0e283] lax [info]waiting for signal or subprocess to exit
That looks better after that file removal. I'm not sure why that happened. I have some docs to push up around the v0.4.0 changes tomorrow too. I'm glad it went smoothly outside of the odd zero-length file.
I'll see if I can reproduce that tomorrow.
@tjheeta I didn't have any luck reproducing this and I don't see any obvious reason from the code why it would happen. I'll leave this issue open now in case anyone else hits the same problem and we can try to investigate further then.