mountpoint-s3
mountpoint-s3 copied to clipboard
`os.listdir` (and presumably other os functions) fails in the face of signals
Mountpoint for Amazon S3 version
mount-s3 1.7.2
AWS Region
us-east-1
Describe the running environment
Running on EC2, accessing an S3 bucket through my account, using this AMI: Deep Learning Base Proprietary Nvidia Driver GPU AMI (Ubuntu 20.04) 20240314
. Same setup as here: https://github.com/plasma-umass/scalene/issues/841
Mountpoint options
mount-s3 <MY_BUCKET> /mnt/mounted_bucket_dir
What happened?
This error (failure when running in a mounted S3 system) was brought to my attention with this issue with the Scalene profiler: https://github.com/plasma-umass/scalene/issues/841
The root cause turns out to be the CPU timer signal; if the os.listdir
function is interrupted by a signal.SIGALRM
, the call fails with an OSError
. I set the frequency below to a level that triggers the failure roughly half the time; setting it to 1 second makes it never happen. Since the default CPU sampling frequency used by Scalene is 0.01 seconds, it fails consistently. Note that the profiler py-spy
also causes this failure.
MRE here:
import os
import signal
import time
def signal_handler(signum, frame):
print("signal")
pass
signal.setitimer(signal.ITIMER_REAL, 0.02) # setting to 0.01 triggers failures reliably
signal.signal(signal.SIGALRM, signal_handler)
dirname = "/mnt/mounted_bucket_dir"
assert os.path.exists(dirname)
assert os.access(dirname, os.R_OK)
print(len(os.listdir(dirname)))
Example failure:
Traceback (most recent call last):
File "test-mount-s3.py", line 16, in <module>
print(len(os.listdir(dirname)))
OSError: [Errno 22] Invalid argument: '/mnt/mounted_bucket_dir'
I have implemented a workaround for this situation (wrapping all os
functions so that they block the SIGALRM
signal) - https://github.com/plasma-umass/scalene/pull/842 - , but it seems like it is exposing a race condition in mount-s3, or at minimum, undesirable behavior.
Relevant log output
2024-07-29T00:47:59.230133Z WARN mountpoint_s3::cli: failed to detect network throughput. Using 10 gbps as throughput. Use --maximum-throughput-gbps CLI flag to configure a target throughput appropriate for the instance. Detection failed due to: failed to get network throughput
2024-07-29T00:48:15.393878Z WARN readdirplus{req=10 ino=1 fh=1 offset=1}: mountpoint_s3::fuse: readdirplus failed: out-of-order readdir, expected=3, actual=1