libcloud icon indicating copy to clipboard operation
libcloud copied to clipboard

Deletion of S3 objects fails for object paths starting with a slash

Open jan-mue opened this issue 3 years ago • 7 comments

Summary

When trying to delete all objects in an S3 bucket, the deletion fails for objects with a leading slash in their path. The delete_object method returns True for these objects, even though they are not deleted.

Detailed Information

To delete an S3 bucket we use Container.list_objects() and delete_object to first delete all objects in the bucket. This doesn't work for objects, where the object path starts with a slash. The delete_object function returns True for these objects even though they are not deleted because the DELETE request is sent to the wrong URL. For example to delete the object s3://container-name//path/to/object the DELETE request is sent to https://s3-eu-west-1.amazonaws.com/container-name/path/to/object but it should be sent to this path instead: https://s3-eu-west-1.amazonaws.com//container-name/path/to/object The problem is that double slashes are replaced in the request URL in the method Connection.morph_action_hook. This is a similar issue to #1529.

To reproduce the issue, you can use the following code (replace credentials and bucket name):

from libcloud.storage.types import Provider
from libcloud.storage.providers import get_driver


driver = get_driver(Provider.S3)
client = driver("key", "secret", region="eu-west-1")

container = client.get_container("container-name")
objects = container.list_objects()
for obj in objects:
    client.delete_object(obj)
client.delete_container(container)

This will result in an ContainerIsNotEmptyError if the bucket "container-name" contains objects where the object path starts with a slash.

Libcloud version: 3.4.1 Python version: 3.7.12

jan-mue avatar Jan 31 '22 13:01 jan-mue

Thanks for reporting this bug.

IIRC, we had a similar bug recently, but don't recall anymore if it was S3 or some other driver (need to dig in).

Kami avatar Feb 01 '22 16:02 Kami

EDIT: I think I found it - #1529, #1531.

So libcloud.common.base.ALLOW_PATH_DOUBLE_SLASHES = True workaround should work as well - https://github.com/apache/libcloud/pull/1531/files#diff-9162be95b923e5cb64a41d8eb40fed8319817ac497231758ceab596e548e3ebaR67.

Maybe in a future release we can consider changing the default behavior to allow double slashes, but in the past I was a concerned that could negatively impact or break things.

Kami avatar Feb 01 '22 16:02 Kami

As I‘ve already written in the original issue details, setting this constant doesn’t work. We would need it as a parameter to the constructor of the S3 driver class. If you would accept a PR with such a change, I can start working on it tomorrow.

jan-mue avatar Feb 01 '22 19:02 jan-mue

@jan-mue Sorry, I don't know how I missed that part in the description.

Yes, in the short term I think adding constructor argument ex_allow_slashes or similar is fine.

As far as that constant not working - perhaps you could clarify what you mean with "doesn't work because we don't import anything from the module libcloud.common.base directly.".

You can't import that module and set the module level constant in your code (at the top, before importing other libcloud resources)?

Kami avatar Feb 01 '22 19:02 Kami

As far as that constant not working - perhaps you could clarify what you mean with "doesn't work because we don't import anything from the module libcloud.common.base directly.".

You can't import that module and set the module level constant in your code (at the top, before importing other libcloud resources)?

EDIT: We just tested it again and setting the constant actually does work. Sorry for the confusion.

Yes, I tried importing it and setting the constant before anything else, but for some reason this change was not picked up. I assumed that it would only work for imports directly from the module libcloud.common.base but I might be wrong about that.

jan-mue avatar Feb 01 '22 20:02 jan-mue

@jan-mue Thanks for confirming.

Is there anything else we could do to make that easier / more obvious to the end user (docs, code or similar)?

I guess one think we could perhaps do is update the S3 driver code to emit a warning on this in case the object name contains a leading slash.

As I said above, long term I do think we should probably change the default behavior, but this may break / change some of the code which depends on the old behavior - so we would need to do it in a major release and still leave that constant in place so user can revert it back in case it breaks their code.

Kami avatar Feb 03 '22 10:02 Kami

I would actually expect libcloud to raise an ObjectDoesNotExistError when the double slash is removed from the object path, but this is probably hard to implement since AWS still returns a response with status 204 even if the object isn’t deleted. This would also immediately break any code that expects the delete_object function to return True for objects that don’t exist. So maybe a warning would be enough for now and then this change can be made in a new major release.

jan-mue avatar Feb 05 '22 10:02 jan-mue