Supervisor cannot Handle NFS Mounts with Stale File Handles
Describe the issue you are experiencing
Occasionally (usually at least once a day), my NFS backup mount will just disconnect, and I will get a repair saying it failed. Reloading does not work, and it says check the logs. The logs say it is a stale file handle. The only way to resolve the issue is to reboot the host for HAOS.
I know I am using Unraid and Unraid has been shown to not have the best performance, but I have 9 machines (7 Linux and 2 Windows) connected to this server via NFS and Home Assistant is the only one that that just breaks regularly so there seems to be something wrong in how HA Supervisor is managing the NFS mount.
It is rather annoying needing to reboot HA potentially multiple times a day.
What type of installation are you running?
Home Assistant OS
Which operating system are you running on?
Home Assistant Operating System
Steps to reproduce the issue
- Setup NFS Mount
- ???
- It Breaks
Anything in the Supervisor logs that might be useful for us?
Supervisor logs:
24-03-27 08:15:30 INFO (MainThread) [supervisor.mounts.manager] Reloading mount: backup
24-03-27 08:15:30 ERROR (MainThread) [supervisor.mounts.mount] Reloading backup did not succeed. Check host logs for errors from mount or systemd unit mnt-data-supervisor-mounts-backup.mount for details.
24-03-27 08:15:38 INFO (MainThread) [supervisor.mounts.manager] Reloading mount: backup
24-03-27 08:15:38 ERROR (MainThread) [supervisor.mounts.mount] Reloading backup did not succeed. Check host logs for errors from mount or systemd unit mnt-data-supervisor-mounts-backup.mount for details.
Host logs:
Mar 27 12:15:30 home systemd[1]: Reloading Supervisor nfs mount: backup...
Mar 27 12:15:30 home mount[2768611]: mount.nfs: Stale file handle
Mar 27 12:15:30 home systemd[1]: mnt-data-supervisor-mounts-backup.mount: Mount process exited, code=exited, status=1/FAILURE
Mar 27 12:15:30 home systemd[1]: Reload failed for Supervisor nfs mount: backup.
Mar 27 12:15:37 home systemd[1]: run-docker-runtime\x2drunc-moby-ff86bbb670854cdfd1f1814e02525b7fd2c179dc9087c182208732cd9b618704-runc.0pBNcO.mount: Deactivated successfully.
Mar 27 12:15:38 home systemd[1]: Reloading Supervisor nfs mount: backup...
Mar 27 12:15:38 home mount[2769226]: mount.nfs: Stale file handle
Mar 27 12:15:38 home systemd[1]: mnt-data-supervisor-mounts-backup.mount: Mount process exited, code=exited, status=1/FAILURE
Mar 27 12:15:38 home systemd[1]: Reload failed for Supervisor nfs mount: backup.
System Health information
System Information
| version | core-2024.3.2 |
|---|---|
| installation_type | Home Assistant OS |
| dev | false |
| hassio | true |
| docker | true |
| user | root |
| virtualenv | false |
| python_version | 3.12.2 |
| os_name | Linux |
| os_version | 6.6.20-haos |
| arch | x86_64 |
| timezone | America/New_York |
| config_dir | /config |
Home Assistant Community Store
| GitHub API | ok |
|---|---|
| GitHub Content | ok |
| GitHub Web | ok |
| GitHub API Calls Remaining | 4885 |
| Installed Version | 1.34.0 |
| Stage | running |
| Available Repositories | 1406 |
| Downloaded Repositories | 21 |
Home Assistant Cloud
| logged_in | true |
|---|---|
| subscription_expiration | December 31, 2017 at 7:00 PM |
| relayer_connected | false |
| relayer_region | null |
| remote_enabled | false |
| remote_connected | false |
| alexa_enabled | false |
| google_enabled | false |
| remote_server | null |
| certificate_status | null |
| instance_id | 6d1891b515664ee79d1010b60c891d36 |
| can_reach_cert_server | ok |
| can_reach_cloud_auth | ok |
| can_reach_cloud | ok |
Home Assistant Supervisor
| host_os | Home Assistant OS 12.1 |
|---|---|
| update_channel | stable |
| supervisor_version | supervisor-2024.03.0 |
| agent_version | 1.6.0 |
| docker_version | 24.0.7 |
| disk_total | 109.3 GB |
| disk_used | 67.6 GB |
| healthy | true |
| supported | true |
| board | generic-x86-64 |
| supervisor_api | ok |
| version_api | ok |
| installed_addons | Studio Code Server (5.15.0), Mosquitto broker (6.4.0), Advanced SSH & Web Terminal (17.2.0), Promtail (2.2.0), Node-RED (17.0.10), Z-Wave JS UI (3.4.1), Github Actions Runner (3), Zigbee2MQTT (1.36.0-1), SQLite Web (4.1.2), Whisper (2.0.0), Piper (1.5.0), Silicon Labs Multiprotocol (2.4.4), Matter Server (5.4.1), openWakeWord (1.10.0), ESPHome (2024.3.1) |
Dashboards
| dashboards | 3 |
|---|---|
| resources | 17 |
| views | 35 |
| mode | yaml |
Recorder
| oldest_recorder_run | March 17, 2024 at 7:48 PM |
|---|---|
| current_recorder_run | March 27, 2024 at 8:23 AM |
| estimated_db_size | 6373.67 MiB |
| database_engine | sqlite |
| database_version | 3.44.2 |
Supervisor diagnostics
config_entry-hassio-56582080815fc62cb9d95187042d8c10.json
Additional information
No response
I know I am using Unraid and Unraid has been shown to not have the best performance, but I have 9 machines (7 Linux and 2 Windows) connected to this server via NFS and Home Assistant is the only one that that just breaks regularly so there seems to be something wrong in how HA Supervisor is managing the NFS mount.
Are those other Linux systems also constantly connected?
Occasionally (usually at least once a day), my NFS backup mount will just disconnect, and I will get a repair saying it failed.
I guess this is kinda the root of the problem: The question is why does it disconnect?
Home Assistant is doing a mount manager reload every 15 minutes, which leads to a check. My guess is that this check fails every now and then. And from that fail onwards, the (re-)mount is not possible.
The StackExchange thread mount.nfs: Stale file handle error - cannot umount has some information why remounting fails in this case.
I guess we can't really do something client side. Is the server constantly on?
Yes. All of the other machines are constantly connected and only HA has the issue. Most of them are constantly using the mount as well.
The mounts are managed by the OS. Here is the fstab for them (on the Linux machines). They basically never disconnect or get stale file handles.
ip:/mnt/user/backup /unraid/backup nfs defaults,timeo=900,retrans=5,_netdev 0 0
For Home Assistant network storage, the Supervisor creates a systemd mount unit on Home Assistant OS, which in turn calls the operating system mount commands (in your case the unit name is mnt-data-supervisor-mounts-backup.mount). In the end, this is not much different from a fstab entry.
The host logs you shared are from the point when you try to reload the mount using the repair, correct?
Is there maybe some log entry before that, when it potentially breaks? Can you also check the Supervisor logs, if they have something when the potential breakage happens?
I suspect that the 15 minutes mount reload causes havoc somehow, in your case.
There is nothing additional in the host logs. Just says it failed. It looks like the reloading logs from the OP is when triggers the repair saying it failed. Supervisor logs do not seem to go back further than a few minutes, so I doubt I will get the logs from the precise time it fails.
2024-04-12 23:48:45.052 home kernel: audit: type=1334 audit(1712965725.050:486): prog-id=153 op=LOAD
2024-04-12 23:48:45.061 home systemd[1]: Starting Time & Date Service...
2024-04-12 23:48:45.235 home systemd[1]: Started Time & Date Service.
2024-04-12 23:49:15.077 home systemd[1]: systemd-hostnamed.service: Deactivated successfully.
2024-04-12 23:49:15.117 home kernel: audit: type=1334 audit(1712965755.115:487): prog-id=150 op=UNLOAD
2024-04-12 23:49:15.117 home kernel: audit: type=1334 audit(1712965755.115:488): prog-id=149 op=UNLOAD
2024-04-12 23:49:15.117 home kernel: audit: type=1334 audit(1712965755.115:489): prog-id=148 op=UNLOAD
2024-04-12 23:49:15.271 home systemd[1]: systemd-timedated.service: Deactivated successfully.
2024-04-12 23:49:15.279 home kernel: audit: type=1334 audit(1712965755.277:490): prog-id=151 op=UNLOAD
2024-04-12 23:49:15.280 home kernel: audit: type=1334 audit(1712965755.278:491): prog-id=153 op=UNLOAD
2024-04-12 23:49:15.280 home kernel: audit: type=1334 audit(1712965755.278:492): prog-id=152 op=UNLOAD
2024-04-12 23:49:45.326 home systemd[1]: run-docker-runtime\x2drunc-moby-aed93476013bfd982b3b1f514014b9e882e9049b4c6fc64d299ec3dfbf528428-runc.wfFqIe.mount: Deactivated successfully.
2024-04-12 23:51:30.674 home systemd[1]: run-docker-runtime\x2drunc-moby-f49e1580244ed850a42e3254039536671a44479852b57792b29b264a9137a441-runc.Ke9OTz.mount: Deactivated successfully.
2024-04-12 23:54:05.898 home systemd[1]: run-docker-runtime\x2drunc-moby-a83502dd733fdaea40c9a3420078da2be9ec1ac84f17f279f87a3e83e9faaae6-runc.DKvkXi.mount: Deactivated successfully.
2024-04-13 00:23:33.532 home systemd[1]: run-docker-runtime\x2drunc-moby-f49e1580244ed850a42e3254039536671a44479852b57792b29b264a9137a441-runc.Cgon3N.mount: Deactivated successfully.
2024-04-13 00:23:57.576 home systemd[1]: run-docker-runtime\x2drunc-moby-3278db46567e871cfcd6876f170de06748997d57cd5b9d615f687797b363aa87-runc.BpXC5N.mount: Deactivated successfully.
2024-04-13 00:29:38.724 home systemd[1]: run-docker-runtime\x2drunc-moby-3278db46567e871cfcd6876f170de06748997d57cd5b9d615f687797b363aa87-runc.XObDMb.mount: Deactivated successfully.
2024-04-13 00:40:39.058 home systemd[1]: run-docker-runtime\x2drunc-moby-f49e1580244ed850a42e3254039536671a44479852b57792b29b264a9137a441-runc.jGYQVZ.mount: Deactivated successfully.
2024-04-13 00:44:12.925 home systemd[1]: run-docker-runtime\x2drunc-moby-aed93476013bfd982b3b1f514014b9e882e9049b4c6fc64d299ec3dfbf528428-runc.Uz7psY.mount: Deactivated successfully.
2024-04-13 00:53:10.614 home systemd[1]: run-docker-runtime\x2drunc-moby-a83502dd733fdaea40c9a3420078da2be9ec1ac84f17f279f87a3e83e9faaae6-runc.SHnJLU.mount: Deactivated successfully.
2024-04-13 01:09:03.222 home systemd[1]: run-docker-runtime\x2drunc-moby-f3317a65d25dbbf4987cbf55a84b57b1d8ae9c8b8366ce4bb696733ba193809b-runc.q2BxV0.mount: Deactivated successfully.
2024-04-13 01:13:06.464 home systemd[1]: run-docker-runtime\x2drunc-moby-3278db46567e871cfcd6876f170de06748997d57cd5b9d615f687797b363aa87-runc.8DujJZ.mount: Deactivated successfully.
2024-04-13 01:13:10.969 home systemd[1]: run-docker-runtime\x2drunc-moby-781da2ccd41348462c38f5f55cfe364a326f699a56f13c6a3fae50acc15d91e9-runc.oXV3gP.mount: Deactivated successfully.
2024-04-13 01:49:12.468 home systemd[1]: run-docker-runtime\x2drunc-moby-3278db46567e871cfcd6876f170de06748997d57cd5b9d615f687797b363aa87-runc.0OECJK.mount: Deactivated successfully.
2024-04-13 01:55:15.402 home systemd[1]: run-docker-runtime\x2drunc-moby-a83502dd733fdaea40c9a3420078da2be9ec1ac84f17f279f87a3e83e9faaae6-runc.VMm56T.mount: Deactivated successfully.
2024-04-13 01:55:25.340 home kernel: audit: type=1334 audit(1712973325.338:493): prog-id=154 op=LOAD
2024-04-13 01:55:25.340 home kernel: audit: type=1334 audit(1712973325.338:494): prog-id=155 op=LOAD
2024-04-13 01:55:25.340 home kernel: audit: type=1334 audit(1712973325.338:495): prog-id=156 op=LOAD
2024-04-13 01:55:25.346 home systemd[1]: Starting Hostname Service...
2024-04-13 01:55:25.540 home systemd[1]: Started Hostname Service.
2024-04-13 01:55:25.549 home kernel: audit: type=1334 audit(1712973325.547:496): prog-id=157 op=LOAD
2024-04-13 01:55:25.549 home kernel: audit: type=1334 audit(1712973325.547:497): prog-id=158 op=LOAD
2024-04-13 01:55:25.549 home kernel: audit: type=1334 audit(1712973325.547:498): prog-id=159 op=LOAD
2024-04-13 01:55:25.559 home systemd[1]: Starting Time & Date Service...
2024-04-13 01:55:25.759 home systemd[1]: Started Time & Date Service.
2024-04-13 01:55:55.575 home systemd[1]: systemd-hostnamed.service: Deactivated successfully.
2024-04-13 01:55:55.604 home kernel: audit: type=1334 audit(1712973355.602:499): prog-id=156 op=UNLOAD
2024-04-13 01:55:55.604 home kernel: audit: type=1334 audit(1712973355.602:500): prog-id=155 op=UNLOAD
2024-04-13 01:55:55.604 home kernel: audit: type=1334 audit(1712973355.602:501): prog-id=154 op=UNLOAD
2024-04-13 01:55:55.793 home systemd[1]: systemd-timedated.service: Deactivated successfully.
2024-04-13 01:55:55.802 home kernel: audit: type=1334 audit(1712973355.800:502): prog-id=159 op=UNLOAD
2024-04-13 01:55:55.802 home kernel: audit: type=1334 audit(1712973355.800:503): prog-id=158 op=UNLOAD
2024-04-13 01:55:55.802 home kernel: audit: type=1334 audit(1712973355.800:504): prog-id=157 op=UNLOAD
2024-04-13 02:05:07.417 home systemd[1]: run-docker-runtime\x2drunc-moby-f3317a65d25dbbf4987cbf55a84b57b1d8ae9c8b8366ce4bb696733ba193809b-runc.g2zHGn.mount: Deactivated successfully.
2024-04-13 02:10:12.091 home systemd[1]: run-docker-runtime\x2drunc-moby-859d4c404a41d70d1540670bece13ac8099a0e7bd80d1923b87eb9ec4557972b-runc.6Ia0cK.mount: Deactivated successfully.
2024-04-13 02:16:12.527 home systemd[1]: run-docker-runtime\x2drunc-moby-859d4c404a41d70d1540670bece13ac8099a0e7bd80d1923b87eb9ec4557972b-runc.R8UVOV.mount: Deactivated successfully.
2024-04-13 02:22:04.309 home systemd[1]: Reloading Supervisor nfs mount: backup...
2024-04-13 02:22:04.317 home mount[510850]: mount.nfs: Stale file handle for (null) on /mnt/data/supervisor/mounts/backup
2024-04-13 02:22:04.318 home systemd[1]: mnt-data-supervisor-mounts-backup.mount: Mount process exited, code=exited, status=1/FAILURE
2024-04-13 02:22:04.318 home systemd[1]: Reload failed for Supervisor nfs mount: backup.
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.
Not stale
There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates. Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by adding a comment 👍 This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.