trunk-recorder
trunk-recorder copied to clipboard
Fatal crash: boost "Invalid cross-device link"
-
Linux <hostname-redacted> 5.10.0-25-amd64 #1 SMP Debian 5.10.191-1 (2023-08-16) x86_64 GNU/Linux
-
Docker version 24.0.5, build ced0996
- Using
edge
docker image from docker hub - /data is mounted volume
Program crashes with the following:
boost::filesystem::copy_file: Invalid cross-device link: "/dev/shm/codtrs5/9580-1693338509_852987500.wav", "/data/codtrs5/2023/8/29/9580-1693338509_852987500.wav"
0x7f443a0325d9: (gr::tagged_stream_block::check_topology(int, int)+0x2e49)
0x7f4439c2f24c: (std::rethrow_exception(std::__exception_ptr::exception_ptr)+0x7c)
0x7f4439c2f2b7: (std::terminate()+0x17)
0x7f4439c2f23e: (std::rethrow_exception(std::__exception_ptr::exception_ptr)+0x6e)
0x5595a221e941: (Call_Concluder::manage_call_data_workers()+0xeb1)
0x5595a2140604: (monitor_messages()+0x394)
0x5595a2134210: (main+0x740)
0x7f443987bd90: (__libc_init_first+0x90)
0x7f443987be40: (__libc_start_main+0x80)
0x5595a2137ab5: (_start+0x25)
Problem seems to be mitigated by setting transmissionArchive
to false
.
If you still want to keep transmission archives, the other option is to set tempDir
to the same directory (or at least drive) as captureDir
in the config file. Keeping both of those on the same device should avoid the issue, but you'll miss any benefit of recording all the individual transmissions to a tempfs instead of storage media.
There are a handful of boost library/kernel combos that can cause this, but it's ultimately related to a kernel issue that existed between linux 5.3 and 5.18. Boost created a workaround at some point, and it was fixed in the 6.x kernel, but some distros like debian 11 might still run into the "cross-device link" error.
Since this only really happens under a certain set of circumstances, it might even be best that transmissionArchive: true
disables the use of a temp space. If you're keeping all those wavs, its not like the tempDir
is saving any drive wear, it's just adding complexity.
Just for posterity's sake I'd like to confirm taclane's findings. My main recorder ran the TR official docker image on a Debian 11 box with a backported 6.x kernel and still ran into this error. It was configured to archive transmissions and tempDir wasn't set - I configured it to use a directory on the same volume as the existing audio storage and I can now run newer code without problem.
For more context, this still happens with the latest edge code on a fresh Debian 12 (bookworm) install with kernel 6.1.0-13. Would love any input on known working boost/kernel versions to address this as using something like shm for temp data keeps latency-sensitive IO off of storage altogether which enables a lot more flexibility in deployment.
This workaround also unfortunately triggered a corner case in concert with bad firmware from Samsung and caused two brand new SSDs to burn through their usable life in a couple months necessitating RMA.
It was a little convoluted to map out, but for those using transmissionArchive
, the problem seems be along the lines of:
The current boost::filesystem::copy_file
will error if BOTH:
- boost < 1.76
- linux kernel 5.3 or greater (6.x included)
But std::filesystem::copy_file
will only error if:
- linux kernel 5.3 or greater (6.x NOT included)
#886 should address this by checking the boost version, and attempting a std::filesystem::copy_file
if detects that the boost library hasn't been updated yet. If the installed boost lib is new enough, it will use that instead, which should be a better workaround for anyone using kernel 5.3-5.18.
I just tried this with kernel 6.5 / libboost 1.74, and it prevented a previous error from occurring as the transmission wavs were copied out of the /dev/shm tempfs to disk.
Pulled the latest docker image (edge tag that includes #886), let tempDir default back to shm, and it ran all night without an issue. Good stuff!
Cool!
All that's left is to pull in #887 to fix a typo for boost compatibility going forward (>1.76), and that should hopefully be the end of this issue.
MERGED!! 3 Cheers to @taclane for squashing this bug 🎉
Looks like there still might be a race condition hiding in the workaround somewhere; I get crashes about every 24-36hrs that seem to reference copying a transmission from temp to archive but the file already exists. The docker image still has boost 1.74 so I expect if we can bump that up to something newer than 1.76 then it'll probably defuse the landmine for good.