mountpoint-s3
mountpoint-s3 copied to clipboard
Large cache directories cause Mountpoint failure before mount
Mountpoint for Amazon S3 version
mount-s3 1.13.0
AWS Region
No response
Describe the running environment
EC2 instance running Rocky Linux 8.10
Mountpoint options
/usr/bin/mount-s3 --read-only --allow-other --file-mode 0555 --dir-mode 0555 --part-size 134217728 --metadata-ttl 300 --cache /opt/mountpoint/cache/$BUCKET --max-cache-size 1024 $BUCKET $MOUNT
What happened?
If an instance hangs, typically out-of-memory, and we must hard-boot it in the AWS UI, when it comes back up, mountpoint will frequently not come up until we manually remove all the files in the cache directory.
Relevant log output
Jan 28 22:48:24 hostname mount-s3[1529]: [ERROR] mountpoint_s3::cli: timeout after 30 seconds waiting for message from child process
Jan 28 22:48:24 hostname systemd[1]: mountpoint.service: Control process exited, code=exited status=1
Jan 28 22:48:24 hostname systemd[1]: mountpoint.service: Failed with result 'exit-code'.
Jan 28 22:48:24 hostname systemd[1]: Failed to start Service to mount the xxxx bucket, $bucket, at /mount using aws mountpoint.
Hi @eschernau-fmi, could you collect the logs for Mountpoint with the --debug flag? That could help us confirm whether the mount is stuck while cleaning the cache directory.
How are you manually removing the files in the cache directory? Is it taking long (e.g. longer than 30s)? Does the user have different permission than the mount-s3 process?
Unfortunately I can't run the machine in debug mode, the logs are too massive and grow without end.
Everything is running as root.
On boot, I will get alerts from our system monitoring tool for mountpoint errors in syslog, so I manually log into the host and do:
rm -rf /opt/mountpoint/cache/$bucket/*
Which takes a short period of time, maybe 5-8 seconds. Then I can do a 'systemctl start $myservice' and it comes up.
For reference, systemd service file is:
[Unit] Description=Service to mount the bucket, $bucket, at $mount using aws mountpoint After=network-online.target AssertPathIsDirectory=$mount
[Service] Type=forking User=root ExecStart=/usr/bin/mount-s3 --read-only --allow-other --file-mode 0555 --dir-mode 0555 --part-size 134217728 --metadata-ttl 300 --cache /opt/mountpoint/cache/$bucket --max-cache-size 1024 $bucket $mount ExecStop=/usr/bin/fusermount -u $mount OOMScoreAdjust=-1000
[Install] WantedBy=multi-user.target
[..] Which takes a short period of time, maybe 5-8 seconds
How many files are in the cache directory? From your --max-cache-size 1024, I'd expect ~1024. It is surprising that it takes that long.
Unfortunately I can't run the machine in debug mode, the logs are too massive and grow without end.
As an alternative, could you try and run mount-s3 manually when you log into the host after a failure but before running rm? You could use the same arguments as in the service file plus --debug. That would hopefully tell us why it is failing/stuck.
Good idea, I hadn't thought of that. I'll make sure we run it in debug next time this happens.
@eschernau-fmi, on second thought, depending on the size of the files, the cache could contain a lot more small blocks. Could you report how many are there in one of the cases where you have to manually clear the cache?
I have same problem because the cache data is too large. I set max cache size to 300GB. I have to manually delete the cache dir then mount again. It is ok but with large cache size, it causes a while downtime.
Hi @passaro , I'm configuring --cache to an EBS volume mounted to my EC2 instance. In this case, is metadata cache stored in RAM or disk? I asking this because I'm going to set --metadata-ttl to indefinite and wonder if metadata cache can exceed the RAM or disk storage size. Thank you very much.
Hi @passaro , I'm configuring --cache to an EBS volume mounted to my EC2 instance. In this case, is metadata cache stored in RAM or disk? I asking this because I'm going to set --metadata-ttl to indefinite and wonder if metadata cache can exceed the RAM or disk storage size. Thank you very much.
Hey @nguyenminhdungpg. Today, the metadata cache is stored in RAM.
@dannycjones thank you very much.