s3 icon indicating copy to clipboard operation
s3 copied to clipboard

Navigating sub-directories/buckets?

Open justinperkins opened this issue 12 years ago • 9 comments

Once you dig into a bucket and retrieve the object contents, you just get a giant list of everything. When you're trying to read/list contents on a per-directory basis this proves difficult.

Any way or future plan to allow navigating through sub-directories (or buckets, if that's what they really are)?

justinperkins avatar Mar 02 '12 04:03 justinperkins

Hm, that'd be nice thing to have. If you have an idea how to implement that, please submit a pull request.

Cheers.

qoobaa avatar Mar 02 '12 15:03 qoobaa

I had the same issue yesterday. The comment about sending a :delimiter to find_all in objects_extension.rb implies that is the way to do it, but the code doesn't parse the returned data correctly.

I ended up adding the following code locally to gain this functionality:

module S3
  class Bucket
    def directory_list(options = {})
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)
      parse_directory_list_result(response.body)
    end

    def parse_directory_list_result(xml)
      names = []
      rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") { |e| names << e.text }
      names
    end
  end
end

And then just call it with

bucket.directory_list :prefix => "foo/bar/baz/"

Sorry its not a full blown pull request, but I don't have the time to make one right now. You are, of course, welcome to do whatever you want with this. I'd suggest also removing the documentation about :delimiter from objects_extension.rb, since its a bit of a red-herring.

Thanks for writing this module in the first place, -Chris

chouck avatar Mar 03 '12 13:03 chouck

Chouck, can you try to add some tests, and create a pull request for that?

qoobaa avatar Mar 05 '12 09:03 qoobaa

I'm not really sure if that patch really solves the issue I was experiencing. I want to list just the top-level directories within a given bucket, then with each one of those, this patch becomes effective since you can take a given directory/sub-bucket and pass it into the directory_list method.

UPDATE: This works wonderfully. Sorry for the preemptive comment.

justinperkins avatar Mar 06 '12 16:03 justinperkins

(sorry for spam)

To get objects within a given subdirectory this patch does not totally solve the problem, you still have to iterate over the entire collection and select just the objects you care about. ala ...

all_objects_in_my_bucket = s3_service.buckets.find('some bucket').objects
objects_grouped_by_sub_dir = s3_service.buckets.find('some bucket').directory_list(:prefix => 'some directory with many sub directories').inject({}) { |memo, dir|
  memo[dir] = all_objects_in_my_bucket.select { |o| o.key.include?(dir) }
  memo
}

There's got to be a better way.

justinperkins avatar Mar 06 '12 19:03 justinperkins

True, but I think you and I are trying to solve different problems.

I have a huge tree of data and I wanted a list of sub-directories 3 layers down (which are dynamically generated, so I don't have a fixed list). I don't want all of the files in each of those sub-directories, in fact, I only want one or two out of the thousands that are in each sub-tree.

If I'm understanding what you are saying, it sounds like you want something more like:

my_bucket = s3_service.buckets.find('some bucket') prefix_list = my_bucket.directory_list(:prefix => 'some directory with many sub directories') prefix_list.each { |prefix| objects_grouped_by_sub_dir[prefix] = my_bucket.objects.find_all(:prefix => prefix)}

chouck avatar Mar 08 '12 19:03 chouck

Yes! Guess I should've dug in on the source some more. Thanks.

justinperkins avatar Mar 08 '12 21:03 justinperkins

Sorry to resurrect an old thread but I needed this feature very much, any progress with this or a linked PR? I wouldn't mind creating it!

ericmwalsh avatar Feb 08 '17 22:02 ericmwalsh

Also I expanded on what @chouck created:

module S3
  class Bucket
    # this method recurses if the response coming back
    # from S3 includes a truncation flag (IsTruncated == 'true')
    # then parses the combined response(s) XML body
    # for CommonPrefixes/Prefix AKA directories
    def directory_list(options = {}, responses = [])
      options = {:delimiter => "/"}.merge(options)
      response = bucket_request(:get, :params => options)

      if is_truncated?(response.body)
        directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
      else
        parse_xml_array(responses + [response.body], options)
      end
    end

    private

    def parse_xml_array(xml_array, options = {}, clean_path = true)
      names = []
      xml_array.each do |xml|
        rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
          if clean_path
            names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
          else
            names << e.text
          end
        end
      end
      names
    end

    def next_marker(xml)
      marker = nil
      rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
      if marker.nil?
        raise StandardError
      else
        marker
      end
    end

    def is_truncated?(xml)
      is_truncated = nil
      rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
      is_truncated == 'true'
    end
  end
end

This handles listing out directories when you run into a key limit (due to the S3 API MaxKeys hard limit of 1000 keys). The request will recurse and grab all responses before parsing and returning them. I also added the ability to return "clean directory names" (folder names only) in lieu of returning the entire key/path.

ericmwalsh avatar Feb 09 '17 01:02 ericmwalsh