op.stat("/") would return OK even if the bucket does not exist
Describe the bug
Configure an S3 compatible backend, said MinIO.
I'd like to use op.stat("/") to ensure that the bucket is ready, or else it should return not found.
Steps to Reproduce
op = /* create operator */ ;
(|| async { op.stat("/").await })
.retry(&ExponentialBuilder::default())
.await
.unwrap();
op.remove_all("").await.unwrap()
stat will return Metadata { metakey: FlagSet(Complete | Mode), mode: DIR, cache_control: None, content_disposition: None, content_length: None, content_md5: None, content_range: None, content_type: None, etag: None, last_modified: None, version: None } but the remove_all will fail with NotFound (permanent) at List::next => S3Error { code: "NoSuchBucket", message: "The specified bucket does not exist" ... }.
Expected Behavior
stat should return NotFound or NotSuchBucket if the bucket does not exist.
Additional Context
Related code exists at CompleteAccessor::complete_stat:
https://github.com/apache/opendal/blob/31d09673093d92a87e72575d1b00bdd4529dcf56/core/src/layers/complete.rs#L170-L178
Are you willing to submit a PR to fix this bug?
- [ ] Yes, I would like to submit a PR.
The stat("/") function is treated specially because opendal requires the storage backend to ensure that the service is available and that the root is always valid.
We implemented this feature because we noticed that users frequently perform stat("/") before beginning operations, and 99% of these actions are unnecessary. We have a separate check() function for this purpose.
It does surprise the user, though; I will try to document it instead.
The stat("/") function is treated specially because opendal requires the storage backend to ensure that the service is available and that the root is always valid.
What does it mean? If the operator is misconfigured, such op should fail.
I will try to document it instead.
I have two questions here:
- What if we fall through the following code without special checks?
- What if we call
checkforstat("/")?
There is a history behind using stat("/") and stat("path/to/dir/").
Previously, services like S3 did not support statting a directory; they would always return Ok because object storage services lack a concept of directories. As a workaround, we treated / specially to enable it to return quickly without going through the entire process.
In newer versions, we've improved how we handle stat("path/to/dir/") by verifying the existence of the path on S3 using list("path/to/dir/"). However, listing operations are expensive (10 times more than GET/HEAD) and slow (1/10 as fast as HEAD). Therefore, we have kept the logic for stat("/") unchanged.
Perhaps we can relocate the list-stat simulation logic to S3 and allow users to enable/disable it, ensuring that stat("/") is always sent with a real request.
No actions to take so far. let's close.