opendal icon indicating copy to clipboard operation
opendal copied to clipboard

op.stat("/") would return OK even if the bucket does not exist

Open tisonkun opened this issue 1 year ago • 3 comments

Describe the bug

Configure an S3 compatible backend, said MinIO.

I'd like to use op.stat("/") to ensure that the bucket is ready, or else it should return not found.

Steps to Reproduce

op = /* create operator */ ;
(|| async { op.stat("/").await })
                .retry(&ExponentialBuilder::default())
                .await
                .unwrap();

op.remove_all("").await.unwrap()

stat will return Metadata { metakey: FlagSet(Complete | Mode), mode: DIR, cache_control: None, content_disposition: None, content_length: None, content_md5: None, content_range: None, content_type: None, etag: None, last_modified: None, version: None } but the remove_all will fail with NotFound (permanent) at List::next => S3Error { code: "NoSuchBucket", message: "The specified bucket does not exist" ... }.

Expected Behavior

stat should return NotFound or NotSuchBucket if the bucket does not exist.

Additional Context

Related code exists at CompleteAccessor::complete_stat:

https://github.com/apache/opendal/blob/31d09673093d92a87e72575d1b00bdd4529dcf56/core/src/layers/complete.rs#L170-L178

Are you willing to submit a PR to fix this bug?

  • [ ] Yes, I would like to submit a PR.

tisonkun avatar Jul 13 '24 21:07 tisonkun

The stat("/") function is treated specially because opendal requires the storage backend to ensure that the service is available and that the root is always valid.

We implemented this feature because we noticed that users frequently perform stat("/") before beginning operations, and 99% of these actions are unnecessary. We have a separate check() function for this purpose.

It does surprise the user, though; I will try to document it instead.

Xuanwo avatar Jul 14 '24 11:07 Xuanwo

The stat("/") function is treated specially because opendal requires the storage backend to ensure that the service is available and that the root is always valid.

What does it mean? If the operator is misconfigured, such op should fail.

I will try to document it instead.

I have two questions here:

  1. What if we fall through the following code without special checks?
  2. What if we call check for stat("/")?

tisonkun avatar Jul 14 '24 18:07 tisonkun

There is a history behind using stat("/") and stat("path/to/dir/").

Previously, services like S3 did not support statting a directory; they would always return Ok because object storage services lack a concept of directories. As a workaround, we treated / specially to enable it to return quickly without going through the entire process.

In newer versions, we've improved how we handle stat("path/to/dir/") by verifying the existence of the path on S3 using list("path/to/dir/"). However, listing operations are expensive (10 times more than GET/HEAD) and slow (1/10 as fast as HEAD). Therefore, we have kept the logic for stat("/") unchanged.

Perhaps we can relocate the list-stat simulation logic to S3 and allow users to enable/disable it, ensuring that stat("/") is always sent with a real request.

Xuanwo avatar Jul 15 '24 04:07 Xuanwo

No actions to take so far. let's close.

Xuanwo avatar Jan 09 '25 09:01 Xuanwo