mergerfs icon indicating copy to clipboard operation
mergerfs copied to clipboard

Renaming or creation of file fails (Telegram won't start)

Open RazielleS opened this issue 3 years ago • 14 comments

Describe the bug

I have recently changed from nfs4 to virtiofs for some mounts (including /home) in a VM. Since I changed the mount on the home folder, I get a weird error when starting Telegram in this VM. It complains something about not being able to open a file:

[2021.12.22 18:20:15] Launched version: 3003000, install beta: [FALSE], alpha: 0, debug mode: [FALSE]
[2021.12.22 18:20:15] Executable dir: /usr/bin/, name: telegram-desktop
[2021.12.22 18:20:15] Initial working dir: /home/raziel/
[2021.12.22 18:20:15] Working dir: /home/raziel/.local/share/TelegramDesktop/
[2021.12.22 18:20:15] Command line: telegram-desktop --
[2021.12.22 18:20:15] Executable path before check: /usr/bin/telegram-desktop
[2021.12.22 18:20:15] Logs started
[2021.12.22 18:20:15] Launcher filename: telegramdesktop.desktop
[2021.12.22 18:20:15] Connecting local socket to /run/user/1000/a0b1cff8440a846e992153710f78f8a0-{87A94AB0-E370-4cde-98D3-ACC110C5967D}...
[2021.12.22 18:20:15] Socket connect error 0, starting server and app...
[2021.12.22 18:20:15] Could not open '/home/raziel/.local/share/TelegramDesktop/log.txt' file to start new logging: No such file or directory
[2021.12.22 18:20:15] Executable path before check: /usr/bin/telegram-desktop
[2021.12.22 18:20:15] FATAL: Could not move logging to '/home/raziel/.local/share/TelegramDesktop/log.txt'!

It looks like it has something to do with mergerfs, because if I change the mount to the non-mergerfs-source, everything works.

It might be related to #989 because it sounds a bit familiar. Although this runs on an Arch Linux Server and client.

I have added an strace of the app and mergerfs, done at the same time. I hope you can do something with them, because I am a bit lost there. As far as I understand app.strace, the log file gets deleted and another one is moved in its place. But I don't understand the mergerfs.strace.

I can delete (rm) the mentioned file without any issue, and it even gets recreated when starting Telegram. But Telegram seems to see this differently.

To Reproduce

  • Bind home as mount with virtiofs
  • Start telegram

I could not reproduce it with any default tools like rm, mv, ln or touch

Expected behavior

The file should get copied / moved without errors

System information:

  • OS, kernel version: Linux sec 5.15.10-arch1-1-vfio #1 SMP PREEMPT Mon, 20 Dec 2021 17:55:46 +0000 x86_64 GNU/Linux
  • mergerfs version: 2.33.3
  • mergerfs settings
 /mnt/disk1:/mnt/disk2   /mnt/userHdd/   fuse.mergerfs   allow_other,noforget,inodecalc=path-hash,nfsopenhack=all,cache.files=partial,dropcacheonclose=true,category.create=mfs,ignorepponrename=true,fsname=diskHdd  0       0
/mnt/cache:/mnt/userHdd /mnt/user/      fuse.mergerfs   allow_other,noforget,inodecalc=path-hash,nfsopenhack=all,cache.files=partial,dropcacheonclose=true,category.create=epff,func.mkdir=epall,moveonenospc=true,minfreespace=100G,ignorepponrename=true,fsname=disk*      0       0
  • List of drives, filesystems, & sizes:
    • df -h:
diskHdd                  15T  7.9T  5.9T  58% /mnt/userHdd
disk*                    17T  8.8T  6.7T  57% /mnt/user
/dev/sdb1               7.3T  4.0T  3.0T  58% /mnt/disk1
/dev/sdc1               7.3T  4.0T  3.0T  58% /mnt/disk2
/dev/sda1               7.3T  3.8T  3.2T  55% /mnt/parity0
/dev/md127p1            1.8T  905G  835G  53% /mnt/disk
  • lsblk
NAME           MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
sda              8:0    1   7.3T  0 disk  
└─sda1           8:1    1   7.3T  0 part  /mnt/parity0
sdb              8:16   1   7.3T  0 disk  
└─sdb1           8:17   1   7.3T  0 part  /mnt/disk1
sdc              8:32   1   7.3T  0 disk  
└─sdc1           8:33   1   7.3T  0 part  /mnt/disk2
nvme2n1        259:0    0 931.5G  0 disk  
└─nvme2n1p1    259:1    0 931.5G  0 part  
  └─md127        9:127  0   1.8T  0 raid0 
    └─md127p1  259:7    0   1.8T  0 part  /mnt/cache
                                          /mnt/disk0
nvme0n1        259:2    0 931.5G  0 disk  
└─nvme0n1p1    259:3    0 931.5G  0 part  
  └─md127        9:127  0   1.8T  0 raid0 
    └─md127p1  259:7    0   1.8T  0 part  /mnt/cache
                                          /mnt/disk0
  • A strace of the application having a problem: app.strace.txt.gz (sorry for zipping it, github didn't allow me to upload another txt file somehow...)

  • strace of mergerfs while app tried to do it's thing: mergerfs.strace.txt

RazielleS avatar Dec 22 '21 17:12 RazielleS

The mergerfs trace doesn't show interacting with the log files. Either the temp file or log.txt.

Are you able to reproduce this with Telegram (or better yet simple tooling) without all that setup?

trapexit avatar Dec 22 '21 18:12 trapexit

BTW... you need to use use_ino

trapexit avatar Dec 22 '21 18:12 trapexit

I had a bunch of other issues with use_ino together with nfs. But since I am moving away from it, I can give it another try.

I now disabled the FS caching of virtiofs (server and client), because I thought maybe I get a better trace then...and instead it works now. This leaves me very confused on how virtiofs and mergerfs interact, but I am also happy that it works. I would create an issue at virtiofs...but honestly I am not sure on how to describe the issue anymore.

RazielleS avatar Dec 22 '21 18:12 RazielleS

As the docs mention use_ino is required for inodecalc to make any sense. Otherwise libfuse is just giving out random inode values.

caching could explain some of the situation but I don't know enough about virtiofs to comment without more investigation (which btw is being changed in the kernel to use basically the same codepath as FUSE filesystems)

Caching shouldn't be keeping calls from reaching mergerfs generally. Especially not create calls. Would need more investigation.

trapexit avatar Dec 22 '21 19:12 trapexit

I will finish the migration and change use_ino as soon as possible. Maybe it helps.

I will also play around with caching a bit and make a few more traces. Thank you so far :-)

RazielleS avatar Dec 22 '21 19:12 RazielleS

Turned out I got the wrong pid when doing the strace of mergerfs...There are two instances and I got the wrong one :-(

These traces now both show log.txt. Traces done when the error occured (so cache back to normal) traces.zip

RazielleS avatar Dec 22 '21 19:12 RazielleS

Oh, I also added use_ino, which no visible change so far

RazielleS avatar Dec 22 '21 19:12 RazielleS

use_ino impacts the inode generation. You set inodecalc and that has no meaning/impact unless you enable use_ino.

trapexit avatar Dec 22 '21 21:12 trapexit

It looks like the traces are from different times.

trapexit avatar Dec 22 '21 21:12 trapexit

You're right. I'll have a look at it tomorrow, looks like I am too tired to manage it right now.

RazielleS avatar Dec 22 '21 21:12 RazielleS

So this is the one. Picked the wrong file yesterday, sorry for this. Now both of them are from the same time and say something about log.txt.

traces.zip

RazielleS avatar Dec 23 '21 15:12 RazielleS

After the unlink there is no further evidence that mergerfs is being requested to do anything with log.txt. You said if you turn off some caching it works? Can you trace that?

It's calling renameat2 which mergerfs doesn't yet implement but that shouldn't matter as the kernel errors and the software falls back to a link + unlink behavior.

trapexit avatar Dec 24 '21 04:12 trapexit

This is a trace with disabled caching, when the app works.

strace.zip

RazielleS avatar Dec 25 '21 11:12 RazielleS

Everything looks the same between the two until after the temporary file for log.txt is unlinked. After that the client tries to chmod the linked log.txt file. When caching is off it succeeds and when on it fails. It fails before it even gets to mergerfs. Interestingly, after the chmod, even in the case where the chmod fails, it tries to open log.txt to append to it. With caching it fails with ENOENT implying part of the path doesn't exist and I'm not seeing anything obvious in mergerfs causing it (or that it even gets called).

trapexit avatar Dec 29 '21 20:12 trapexit