cloudpathlib icon indicating copy to clipboard operation
cloudpathlib copied to clipboard

Overly specific asserts on S3Client._remove

Open mcclurem opened this issue 2 years ago • 3 comments

Thanks a bunch for this library, I've been trying to make use of it with an s3-emulating service that is not 100% bit-for-bit compliant and I've run into issues in a couple places. Specifically I've noticed that in S3Client._remove, you assert for response==204, the clone-service I'm using responds with a 200 when you successfully delete instead. I'd love to submit a PR to fix this but I need some input. I think it should be acceptable to check if the response is a 2xx code and by doing so you're more flexible. In addition, asserts outside of unit testing are considered hazardous, instead this should raise an exception of some kind. Do you have a preferred exception to be raised here instead? or should it be left to boto to raise the client error?

mcclurem avatar Feb 28 '22 17:02 mcclurem

@mcclurem Out of curiosity, which S3-emulating service are you using?

Happy to take a PR on this. Answers to specific questions below.


I think it should be acceptable to check if the response is a 2xx code and by doing so you're more flexible.

Yep, seems reasonable.

In addition, asserts outside of unit testing are considered hazardous, instead this should raise an exception of some kind.

Agreed that those should raise exceptions, not assert

Do you have a preferred exception to be raised here instead?

You can add a custom exception if there is not already one that makes sense here: https://github.com/drivendataorg/cloudpathlib/blob/master/cloudpathlib/exceptions.py

pjbull avatar Feb 28 '22 22:02 pjbull

Hi, seems like checking for a 200 or 204 status code (instead of 204 alone) should be enough, since the action performed is a DELETE. About what to do with the exception, seems like any possible error should raise a ClientError (which is already handled) except if the file does not exist, in which case it won't raise an exception (boto3 docs).

https://github.com/drivendataorg/cloudpathlib/blob/4cc0d33b8e80a7f7eef0fae0f9acb38b8b1f62bf/cloudpathlib/s3/s3client.py#L229-L237

But seems like that case was already covered as mentioned in this comment a couples line further:

https://github.com/drivendataorg/cloudpathlib/blob/4cc0d33b8e80a7f7eef0fae0f9acb38b8b1f62bf/cloudpathlib/s3/s3client.py#L249-L255

A possibility would be to remove both assertions and raise FileNotFoundError when necessary (if missing_ok is False) in both sections. Would that be OK?

    def _remove(self, cloud_path: S3Path, missing_ok: bool = True) -> None:
        try:
            obj = self.s3.Object(cloud_path.bucket, cloud_path.key)

            # will throw if not a file
            obj.load()

            resp = obj.delete()
-           assert resp.get("ResponseMetadata").get("HTTPStatusCode") == 204
+           # Assumes that a non-successful code means that the file does not exist
+           if resp.get("ResponseMetadata").get("HTTPStatusCode") not in (200, 204) and not missing_ok:
+               raise FileNotFoundError(f"File does not exist: {cloud_path}")

        except ClientError:
            # try to delete as a direcotry instead
            bucket = self.s3.Bucket(cloud_path.bucket)

            prefix = cloud_path.key
            if prefix and not prefix.endswith("/"):
                prefix += "/"

            resp = bucket.objects.filter(Prefix=prefix).delete()

-            # ensure directory deleted; if cloud_path did not exist at all
-            # resp will be [], so no need to check success
-            if resp:
-                assert resp[0].get("ResponseMetadata").get("HTTPStatusCode") == 200
-            else:
-                if not missing_ok:
-                    raise FileNotFoundError(f"File does not exist: {cloud_path}")
+           # Assumes that a non-successful code means that the file does not exist
+           if not resp and not missing_ok:
+               raise FileNotFoundError(f"File does not exist: {cloud_path}")

Of course, this is just speculation, I would need to test the suggested implementation and confirm which statuses does the response get. This is just a draft ⚠️

By the way, when comparing the _remove method for the S3, Azure and GS clients seems like the S3 version is quite different, so the exception has no equivalent in the other clients.

aaossa avatar Sep 23 '22 17:09 aaossa

@aaossa Yep, that implementation looks good to me initially, thanks. Like you said, it will need some testing. Maybe update the message when you raise FileNotFoundError to indicate we were trying to delete that file.

pjbull avatar Sep 24 '22 21:09 pjbull