mountpoint-s3 icon indicating copy to clipboard operation
mountpoint-s3 copied to clipboard

Large cache directories cause Mountpoint failure before mount

Open eschernau-fmi opened this issue 10 months ago • 9 comments

Mountpoint for Amazon S3 version

mount-s3 1.13.0

AWS Region

No response

Describe the running environment

EC2 instance running Rocky Linux 8.10

Mountpoint options

/usr/bin/mount-s3 --read-only --allow-other --file-mode 0555 --dir-mode 0555 --part-size 134217728 --metadata-ttl 300 --cache /opt/mountpoint/cache/$BUCKET --max-cache-size 1024 $BUCKET $MOUNT

What happened?

If an instance hangs, typically out-of-memory, and we must hard-boot it in the AWS UI, when it comes back up, mountpoint will frequently not come up until we manually remove all the files in the cache directory.

Relevant log output

Jan 28 22:48:24 hostname mount-s3[1529]: [ERROR] mountpoint_s3::cli: timeout after 30 seconds waiting for message from child process
Jan 28 22:48:24 hostname systemd[1]: mountpoint.service: Control process exited, code=exited status=1
Jan 28 22:48:24 hostname systemd[1]: mountpoint.service: Failed with result 'exit-code'.
Jan 28 22:48:24 hostname systemd[1]: Failed to start Service to mount the xxxx bucket, $bucket, at /mount using aws mountpoint.

eschernau-fmi avatar Jan 30 '25 16:01 eschernau-fmi

Hi @eschernau-fmi, could you collect the logs for Mountpoint with the --debug flag? That could help us confirm whether the mount is stuck while cleaning the cache directory.

How are you manually removing the files in the cache directory? Is it taking long (e.g. longer than 30s)? Does the user have different permission than the mount-s3 process?

passaro avatar Jan 31 '25 12:01 passaro

Unfortunately I can't run the machine in debug mode, the logs are too massive and grow without end.

Everything is running as root.

On boot, I will get alerts from our system monitoring tool for mountpoint errors in syslog, so I manually log into the host and do:

rm -rf /opt/mountpoint/cache/$bucket/*

Which takes a short period of time, maybe 5-8 seconds. Then I can do a 'systemctl start $myservice' and it comes up.

For reference, systemd service file is:

[Unit] Description=Service to mount the bucket, $bucket, at $mount using aws mountpoint After=network-online.target AssertPathIsDirectory=$mount

[Service] Type=forking User=root ExecStart=/usr/bin/mount-s3 --read-only --allow-other --file-mode 0555 --dir-mode 0555 --part-size 134217728 --metadata-ttl 300 --cache /opt/mountpoint/cache/$bucket --max-cache-size 1024 $bucket $mount ExecStop=/usr/bin/fusermount -u $mount OOMScoreAdjust=-1000

[Install] WantedBy=multi-user.target

eschernau-fmi avatar Jan 31 '25 15:01 eschernau-fmi

[..] Which takes a short period of time, maybe 5-8 seconds

How many files are in the cache directory? From your --max-cache-size 1024, I'd expect ~1024. It is surprising that it takes that long.

Unfortunately I can't run the machine in debug mode, the logs are too massive and grow without end.

As an alternative, could you try and run mount-s3 manually when you log into the host after a failure but before running rm? You could use the same arguments as in the service file plus --debug. That would hopefully tell us why it is failing/stuck.

passaro avatar Jan 31 '25 16:01 passaro

Good idea, I hadn't thought of that. I'll make sure we run it in debug next time this happens.

eschernau-fmi avatar Jan 31 '25 17:01 eschernau-fmi

@eschernau-fmi, on second thought, depending on the size of the files, the cache could contain a lot more small blocks. Could you report how many are there in one of the cases where you have to manually clear the cache?

passaro avatar Feb 03 '25 14:02 passaro

I have same problem because the cache data is too large. I set max cache size to 300GB. I have to manually delete the cache dir then mount again. It is ok but with large cache size, it causes a while downtime.

nguyenminhdungpg avatar Apr 09 '25 03:04 nguyenminhdungpg

Hi @passaro , I'm configuring --cache to an EBS volume mounted to my EC2 instance. In this case, is metadata cache stored in RAM or disk? I asking this because I'm going to set --metadata-ttl to indefinite and wonder if metadata cache can exceed the RAM or disk storage size. Thank you very much.

nguyenminhdungpg avatar Apr 21 '25 04:04 nguyenminhdungpg

Hi @passaro , I'm configuring --cache to an EBS volume mounted to my EC2 instance. In this case, is metadata cache stored in RAM or disk? I asking this because I'm going to set --metadata-ttl to indefinite and wonder if metadata cache can exceed the RAM or disk storage size. Thank you very much.


Hey @nguyenminhdungpg. Today, the metadata cache is stored in RAM.

dannycjones avatar Apr 22 '25 06:04 dannycjones

@dannycjones thank you very much.

nguyenminhdungpg avatar Apr 23 '25 02:04 nguyenminhdungpg