s5cmd
s5cmd copied to clipboard
Accepting path escaped URL's as input
I've been doing a massive S3 cleanup and the names of objects I have to deal with seem contain every imaginable character.
I'm working from a S3 bucket inventory, which escapes the URL's in the CSV's that it creates.
I unescaped them for decision making, then tried to figure out a clean way to quote paths so that I could feed them to s5cmd run
.
It eventually dawned on me that it would be simpler to just feed s5cmd
the escaped paths and teach it to unescape them.
This is work related, so I'm not in a position to develop it into a PR (sigh...), but I thought I'd share the basic change for:
- feedback; and
- as a starting place for someone who needs it and/or can turn it into a feature.
Here's the simple diff:
diff --git a/storage/url/url.go b/storage/url/url.go
index 22ab37d..e53aa1e 100644
--- a/storage/url/url.go
+++ b/storage/url/url.go
@@ -92,6 +92,10 @@ func New(s string, opts ...Option) (*URL, error) {
if len(parts) == 2 {
key = parts[1]
}
+ key, err := url.PathUnescape(key)
+ if err != nil {
+ return nil, err
+ }
if bucket == "" {
return nil, fmt.Errorf("s3 url should have a bucket")
It seems to have worked for me, feeding millions of lines of rm <URL>
into s5cmd run
.