s5cmd icon indicating copy to clipboard operation
s5cmd copied to clipboard

Accepting path escaped URL's as input

Open hartzell opened this issue 2 years ago • 0 comments

I've been doing a massive S3 cleanup and the names of objects I have to deal with seem contain every imaginable character.

I'm working from a S3 bucket inventory, which escapes the URL's in the CSV's that it creates.

I unescaped them for decision making, then tried to figure out a clean way to quote paths so that I could feed them to s5cmd run.

It eventually dawned on me that it would be simpler to just feed s5cmd the escaped paths and teach it to unescape them.

This is work related, so I'm not in a position to develop it into a PR (sigh...), but I thought I'd share the basic change for:

  1. feedback; and
  2. as a starting place for someone who needs it and/or can turn it into a feature.

Here's the simple diff:

diff --git a/storage/url/url.go b/storage/url/url.go
index 22ab37d..e53aa1e 100644
--- a/storage/url/url.go
+++ b/storage/url/url.go
@@ -92,6 +92,10 @@ func New(s string, opts ...Option) (*URL, error) {
        if len(parts) == 2 {
                key = parts[1]
        }
+       key, err := url.PathUnescape(key)
+       if err != nil {
+               return nil, err
+       }

        if bucket == "" {
                return nil, fmt.Errorf("s3 url should have a bucket")

It seems to have worked for me, feeding millions of lines of rm <URL> into s5cmd run.

hartzell avatar Aug 15 '22 14:08 hartzell