juicefs icon indicating copy to clipboard operation
juicefs copied to clipboard

Boot process gets stucked when mounting via systemd-mount

Open lowshoe opened this issue 2 years ago • 6 comments

When trying to automatically mount a juicefs volume via a systemd.mount-unit, the boot process gets stucked after "Mounted JuiceFS".

image

The boot process halts at this point, the following services, for example the ssh-daemon, don't get started.

What you expected to happen: The juicefs-Volume gets mounted and the boot process then runs completely through to the end.

How to reproduce it (as minimally and precisely as possible):

Create and enable a systemd.mount as documented at https://juicefs.com/docs/community/mount_juicefs_at_boot_time#automating-mounting-with-systemdmount

For Example:

systemctl cat mnt-s3.mount

[Unit]
Description=Juicefs

[Mount]
Environment="META_PASSWORD=abcdefg12345"
What=rediss://my.keydb.local:6379/1
Where=/mnt/s3
Type=juicefs
Options=_netdev,user_id=1002,group_id=1002,cache-size=1024000

[Install]
WantedBy=remote-fs.target
WantedBy=multi-user.target

Anything else we need to know?

  • I also tried with the IP-Address of the KeyDB-Server instead of the FQDN to access the KeyDB-Server but this results in the same error.
  • The mount via fstab at boot with the same mount options works.
rediss://:[email protected]:6379/1  /mnt/s3  juicefs  _netdev,user_id=1002,group_id=1002,cache-size=1024000  0 0
  • the manual mount through the systemd mount after booting works too:
systemctl start mnt-s3.mount

systemctl status mnt-s3.mount  --no-pager
● mnt-s3.mount - Juicefs
     Loaded: loaded (/etc/systemd/system/mnt-s3.mount; disabled; preset: disabled)
     Active: active (mounted) since Thu 2023-10-19 14:59:21 CEST; 53s ago
      Until: Thu 2023-10-19 14:59:21 CEST; 53s ago
      Where: /mnt/s3
       What: JuiceFS:myjfs
      Tasks: 12 (limit: 153401)
     Memory: 107.3M
        CPU: 309ms
     CGroup: /system.slice/mnt-s3.mount
             └─1551 "/sbin/mount.juicefs rediss://my.keydb.local:6379/1 /mnt/s3 -o rw,user_id=1002,group_id=1002,cache-size=1024000,_netdev"

Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.087691 juicefs[1528] <INFO>: Meta address: rediss://my.redis.local:6379/1 [interface.go:497]
Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.123640 juicefs[1528] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown pro… [info.go:84]
Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.124005 juicefs[1528] <INFO>: Ping redis latency: 297.474µs [redis.go:3572]
Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.125009 juicefs[1528] <INFO>: Data use s3://vol-www/myjfs/ [mount.go:605]
Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.125910 juicefs[1528] <INFO>: Disk cache (/var/jfsCache/96a19d9e-6e9b-44e2-b202-4d87592015d0/): cap…cache.go:114]
Oct 19 14:59:21 my.host.local /sbin/mount.juicefs[1551]: juicefs[1551] <INFO>: Create session 35 OK with version: 1.1.0+2023-09-04.08c4ae6 [base.go:492]
Oct 19 14:59:21 my.host.local /sbin/mount.juicefs[1551]: juicefs[1551] <INFO>: Prometheus metrics listening on 127.0.0.1:9567 [mount.go:160]
Oct 19 14:59:21 my.host.local /sbin/mount.juicefs[1551]: juicefs[1551] <INFO>: Mounting volume myjfs at /mnt/s3 ... [mount_unix.go:269]
Oct 19 14:59:21 my.host.local mount[1528]: 2023/10/19 14:59:21.629189 juicefs[1528] <INFO>: OK, myjfs is ready at /mnt/s3 [mount_unix.go:48]
Oct 19 14:59:21 my.host.local systemd[1]: Mounted Juicefs.

Environment:

  • JuiceFS version: 1.1.0+2023-09-04.08c4ae6
  • OS (e.g cat /etc/os-release): Oracle Linux Server 9.2
  • Kernel (e.g. uname -a): Linux 5.15.0-106.131.4.el9uek.x86_64
  • Object storage (cloud provider and region, or self maintained): self maintained
  • Metadata engine info (version, cloud provider managed or self maintained): self maintained KeyDB v6.3.3
  • Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage): Local Net
  • Others:

lowshoe avatar Oct 19 '23 13:10 lowshoe

Short update: if i add "nofail" to the mount options, it works.

lowshoe avatar Oct 23 '23 10:10 lowshoe

Hi @lowshoe , there are some details are required:

  • The log of juicefs when boot process gets stuck in
  • Does the keydb also run on this machine? How does it get boot?

Since the "nofail" option works, maybe the juicefs mount stuck because the keydb is unready at that time.

Hexilee avatar Oct 24 '23 03:10 Hexilee

@Hexilee ,

  • regarding to the logs the juicefs mount succeeds even if it stucks afterwards.
Oct 24 11:09:19 my.host.local mount[1147]: 2023/10/24 11:09:19.730415 juicefs[1147] <INFO>: Meta address: rediss://my.redis.local:6379/1 [interface.go:497]
Oct 24 11:09:19 my.host.local polkitd[1370]: Started polkitd version 0.117
Oct 24 11:09:19 my.host.local systemd[1]: Started Dynamic System Tuning Daemon.
Oct 24 11:09:19 my.host.local mount[1147]: 2023/10/24 11:09:19.786790 juicefs[1147] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [info.go:84]
Oct 24 11:09:19 my.host.local mount[1147]: 2023/10/24 11:09:19.787518 juicefs[1147] <INFO>: Ping redis latency: 264.06µs [redis.go:3572]
Oct 24 11:09:19 my.host.local mount[1147]: 2023/10/24 11:09:19.790203 juicefs[1147] <INFO>: Data use s3://vol-www/myjfs/ [mount.go:605]
Oct 24 11:09:19 my.host.local mount[1147]: 2023/10/24 11:09:19.791922 juicefs[1147] <INFO>: Disk cache (/var/jfsCache/96a19d9e-6e9b-44e2-b202-4d87592015d0/): capacity (1024000 MB), free ratio (10%), max pending pages (15) [disk_cache.go:114]
Oct 24 11:09:19 my.host.local polkitd[1370]: Loading rules from directory /etc/polkit-1/rules.d
Oct 24 11:09:19 my.host.local polkitd[1370]: Loading rules from directory /usr/share/polkit-1/rules.d
Oct 24 11:09:19 my.host.local polkitd[1370]: Finished loading, compiling and executing 6 rules
Oct 24 11:09:19 my.host.local systemd[1]: Started Authorization Manager.
Oct 24 11:09:19 my.host.local polkitd[1370]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
Oct 24 11:09:20 my.host.local /sbin/mount.juicefs[1408]: juicefs[1408] <INFO>: Create session 85 OK with version: 1.1.0+2023-09-04.08c4ae6 [base.go:492]
Oct 24 11:09:20 my.host.local /sbin/mount.juicefs[1408]: juicefs[1408] <INFO>: Prometheus metrics listening on 127.0.0.1:9567 [mount.go:160]
Oct 24 11:09:20 my.host.local /sbin/mount.juicefs[1408]: juicefs[1408] <INFO>: Mounting volume myjfs at /mnt/s3 ... [mount_unix.go:269]
Oct 24 11:09:20 my.host.local mount[1147]: 2023/10/24 11:09:20.297623 juicefs[1147] <INFO>: OK, myjfs is ready at /mnt/s3 [mount_unix.go:48]
Oct 24 11:09:20 my.host.local systemd[1]: Mounted Juicefs.
Oct 24 11:09:29 my.host.local systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Oct 24 11:09:49 my.host.local systemd[1]: systemd-hostnamed.service: Deactivated successfully.

in comparison, these are the logs when i add "nofail":

Oct 24 11:35:30 my.host.local mount[1147]: 2023/10/24 11:35:30.460690 juicefs[1147] <INFO>: Meta address: rediss://my.redis.local:6379/1 [interface.go:497]
Oct 24 11:35:30 my.host.local systemd[1]: Started Dynamic System Tuning Daemon.
Oct 24 11:35:30 my.host.local mount[1147]: 2023/10/24 11:35:30.503779 juicefs[1147] <WARNING>: AOF is not enabled, you may lose data if Redis is not shutdown properly. [info.go:84]
Oct 24 11:35:30 my.host.local mount[1147]: 2023/10/24 11:35:30.504045 juicefs[1147] <INFO>: Ping redis latency: 210.623µs [redis.go:3572]
Oct 24 11:35:30 my.host.local mount[1147]: 2023/10/24 11:35:30.506891 juicefs[1147] <INFO>: Data use s3://vol-www/myjfs/ [mount.go:605]
Oct 24 11:35:30 my.host.local mount[1147]: 2023/10/24 11:35:30.507656 juicefs[1147] <INFO>: Disk cache (/var/jfsCache/96a19d9e-6e9b-44e2-b202-4d87592015d0/): capacity (1024000 MB), free ratio (10%), max pending pages (15) [disk_cache.go:114]
Oct 24 11:35:30 my.host.local polkitd[1379]: Loading rules from directory /etc/polkit-1/rules.d
Oct 24 11:35:30 my.host.local polkitd[1379]: Loading rules from directory /usr/share/polkit-1/rules.d
Oct 24 11:35:30 my.host.local polkitd[1379]: Finished loading, compiling and executing 6 rules
Oct 24 11:35:30 my.host.local systemd[1]: Started Authorization Manager.
Oct 24 11:35:30 my.host.local polkitd[1379]: Acquired the name org.freedesktop.PolicyKit1 on the system bus
Oct 24 11:35:30 my.host.local /sbin/mount.juicefs[1416]: juicefs[1416] <INFO>: Create session 93 OK with version: 1.1.0+2023-09-04.08c4ae6 [base.go:492]
Oct 24 11:35:30 my.host.local /sbin/mount.juicefs[1416]: juicefs[1416] <INFO>: Prometheus metrics listening on 127.0.0.1:9567 [mount.go:160]
Oct 24 11:35:30 my.host.local /sbin/mount.juicefs[1416]: juicefs[1416] <INFO>: Mounting volume myjfs at /mnt/s3 ... [mount_unix.go:269]
Oct 24 11:35:31 my.host.local mount[1147]: 2023/10/24 11:35:31.012020 juicefs[1147] <INFO>: OK, myjfs is ready at /mnt/s3 [mount_unix.go:48]
Oct 24 11:35:31 my.host.local systemd[1]: Mounted Juicefs.
Oct 24 11:35:31 my.host.local systemd[1]: Reached target Multi-User System.
Oct 24 11:35:31 my.host.local systemd[1]: Starting Record Runlevel Change in UTMP...
Oct 24 11:35:31 my.host.local systemd[1]: systemd-update-utmp-runlevel.service: Deactivated successfully.
Oct 24 11:35:31 my.host.local systemd[1]: Finished Record Runlevel Change in UTMP.
Oct 24 11:35:31 my.host.local systemd[1]: Startup finished in 775ms (kernel) + 3.314s (initrd) + 2.851s (userspace) = 6.940s.
  • the Keydb-Server is running on a different system.

Could it be that the juicefs-mount is running in the foreground and so is blocking the ongoing of the boot process? /sbin/mount.juicefs is a symbolic link to /usr/local/bin/juicefs and the default mode for juicefs is to run mount in foreground...

lowshoe avatar Oct 24 '23 10:10 lowshoe

Actually, /sbin/mount.juicefs will run in the background until you set env JFS_FOREGROUND=1. Since nofail works for your case, the systemd mount may think /sbin/mount.juicefs fails to run.

There may be some checkers to ensure the mount point is ready refer to https://man.archlinux.org/man/systemd-mount.1.en#OPTIONS . Could you try the no-block or fsck options whether work?

Hexilee avatar Oct 24 '23 16:10 Hexilee

If this helps, i've had some problems too using mount, i've switched to a service since.

[Unit]
Description=JuiceFS mount
AssertFileIsExecutable=/usr/local/bin/juicefs
After=network-online.target systemd-resolved.service
Requires=network-online.target

[Service]
# Environment variables
Environment=JFS_RSA_PASSPHRASE=xxxxxxxxxxxxxx

# Can't be Notify, Not Implemented
Type=simple

# Ensure S3 enpoint can be resolved
ExecStartPre=/bin/sh -c 'while ! host s3.endpoint.com; do /bin/sleep 0.2; done'

# Mount filesystem
ExecStart=/usr/local/bin/juicefs mount \
   --attr-cache 3600 \
   --backup-meta 0 \
   --cache-dir /var/cache/juicefs \
   --writeback \
   "badger:///var/lib/badger" \
   /opt/data

ExecStop=/bin/sh -c 'while lsof | grep /opt/data >/dev/null; do /bin/sleep 2; done'
ExecStop=/usr/local/bin/juicefs umount /opt/data

# Let systemd restart this service
Restart=always
RestartSec=5

# Disable timeout logic and wait until process is stopped
TimeoutStopSec=infinity
SendSIGKILL=no

# Priorities
OOMScoreAdjust=-500
IOSchedulingPriority=1

[Install]
WantedBy=multi-user.target

solracsf avatar Oct 25 '23 12:10 solracsf