s3-resource icon indicating copy to clipboard operation
s3-resource copied to clipboard

Upload Folder to s3 Bucket

Open jaibhavaya opened this issue 7 years ago • 11 comments

Hello,

I see that currently there is no way to supply a folder rather than a file, to be uploaded to s3. Is there a reason why this wasn't implemented? In googling, I find that in the golang aws sdk there doesn't seem to be a straightforward way to do this, in stark constrast to the java sdk and even node sdk. I would love to contribute this if the methods exist, but wanted to make sure there wasn't any particular reason this wasn't implemented yet.

Thanks!

Jacob

jaibhavaya avatar Apr 09 '18 17:04 jaibhavaya

I think it is because an S3 bucket is not a filesystem, so uploading a directory (which might be recursive) is tricky to get right. The usual Concourse approach is simply to create an archive file containing the directory you want to upload (maybe filtering only the files you are really interested into). This works fine. There are also 3rd-party resources that do what you want, for example https://github.com/18F/s3-resource-simple

See also #66

marco-m avatar May 31 '18 19:05 marco-m

@vito are you open to this possibility or do you consider this out of scope for s3-resource ?

marco-m avatar Sep 28 '18 09:09 marco-m

Sounds related to https://github.com/concourse/s3-resource/issues/55 too. As long as the resource can sanely version everything that it's uploading (perhaps by having a version number in the folder name or all filenames), it could make sense, but I'd still like to see the real-world use case for this (where uploading a .tgz would also not suffice) prior to someone implementing it. So far the ask has just been to upload a directory, but no one's given the context as to why. 🙂

vito avatar Sep 28 '18 14:09 vito

I can give one use case: from a Concourse pipeline, we upload to S3 two types of artifacts: a .tgz with s3-resource, and a directory with s3-resource-simple. The reason we upload a directory is because we use an S3 bucket as a place to publish the HTML documentation of a project. It is nice to upload directly the directory because then the bucket can be "served" as-is. The advantage is the simplicity, you don't need to find a way to "unpack" the .tgz on S3. Granted, I am not sure if this justifies the added complexity to s3-resource, so I don't think that this single use-case is enough.

marco-m avatar Sep 28 '18 16:09 marco-m

Makes sense, with the caveat being that it would only make sense to use this resource for that if you also intend to version the documentation. If you're just looking to upload (and possibly replace) the docs each time you publish, that'd be better served by a publishing task or something.

vito avatar Sep 28 '18 18:09 vito

Yes, I forgot to mention that. The documentation is versioned, it is associated to a given commit.

marco-m avatar Sep 28 '18 18:09 marco-m

We have another example for this kind of resource, serving static assets from s3. We want to upload a folder containing all our static assets to be serve from our CDN connected to the bucket.

Currently, we are doing this through a command line aws s3 sync. I feel that using a resource will be much cleaner

EduardoAC avatar Mar 01 '19 09:03 EduardoAC

@EduardoAC +1, that's our use case as well.

jchampio avatar Mar 19 '19 19:03 jchampio

Why isn't this a feature? It seems pretty basic to upload "a directory of objects" into object storage. We don't even care about versioning, we handle versioning on our own via the names of the files.

kallisti5 avatar Aug 13 '19 15:08 kallisti5

@kallisti5 It's not a feature because no one has PR'd it. :slightly_smiling_face: Our team isn't big enough to handle everything ourselves.

vito avatar Aug 13 '19 15:08 vito

@vito fair. I started hacking away at things. Theory at the moment is "disabling versioning" results in the ability to "just upload things"

diff --git a/README.md b/README.md
index 078532b..ff11366 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,8 @@ version numbers.
 * `disable_ssl`: *Optional.* Disable SSL for the endpoint, useful for S3
   compatible providers without SSL.
 
+* `skip_versioning`: *Optional* Don't version artifacts, any previous artifacts will be overwritten.
+
 * `skip_ssl_verification`: *Optional.* Skip SSL verification for S3 endpoint. Useful for S3 compatible providers using self-signed SSL certificates.
 
 * `skip_download`: *Optional.* Skip downloading object from S3. Useful only trigger the pipeline without using the object.
@@ -51,7 +53,7 @@ version numbers.
 
 ### File Names
 
-One of the following two options must be specified:
+For versioning, one of the following two options must be specified:
 
 * `regexp`: *Optional.* The pattern to match filenames against within S3. The first
   grouped match is used to extract the version, or if a group is explicitly
diff --git a/check/command.go b/check/command.go
index b7d302d..965c184 100644
--- a/check/command.go
+++ b/check/command.go
@@ -24,6 +24,8 @@ func (command *Command) Run(request Request) (Response, error) {
 
        if request.Source.Regexp != "" {
                return command.checkByRegex(request), nil
+       } else if request.Source.SkipVersioning {
+               return command.checkByPath(request), nil
        } else {
                return command.checkByVersionedFile(request), nil
        }
diff --git a/models.go b/models.go
index c2f4adf..6225a1c 100644
--- a/models.go
+++ b/models.go
@@ -15,6 +15,7 @@ type Source struct {
        ServerSideEncryption string `json:"server_side_encryption"`
        SSEKMSKeyId          string `json:"sse_kms_key_id"`
        UseV2Signing         bool   `json:"use_v2_signing"`
+       SkipVersioning       bool   `json:"skip_versioning"`
        SkipSSLVerification  bool   `json:"skip_ssl_verification"`
        SkipDownload         bool   `json:"skip_download"`
        InitialVersion       string `json:"initial_version"`
@@ -25,14 +26,18 @@ type Source struct {
 }
 
 func (source Source) IsValid() (bool, string) {
-       if source.Regexp != "" && source.VersionedFile != "" {
-               return false, "please specify either regexp or versioned_file"
+       if !source.SkipVersioning && (source.Regexp != "" && source.VersionedFile != "") {
+               return false, "please specify either regexp or versioned_file for versioning"
        }
 
        if source.Regexp != "" && source.InitialVersion != "" {
                return false, "please use initial_path when regexp is set"
        }
 
+       if source.SkipVersioning && source.InitialPath != "" {
+               return false, "please use initial_path when not using versioning"
+       }
+
        if source.VersionedFile != "" && source.InitialPath != "" {
                return false, "please use initial_version when versioned_file is set"
        }

The logic though given how this resource works is weird. Thinking that "Initial Path" is the local path which gets recursively uploaded?

Haiku is running short on time... we need to get our package repository uploads working. Just using an s3 client as others have done as a workaround.

kallisti5 avatar Aug 19 '19 13:08 kallisti5