s3
s3 copied to clipboard
Navigating sub-directories/buckets?
Once you dig into a bucket and retrieve the object contents, you just get a giant list of everything. When you're trying to read/list contents on a per-directory basis this proves difficult.
Any way or future plan to allow navigating through sub-directories (or buckets, if that's what they really are)?
Hm, that'd be nice thing to have. If you have an idea how to implement that, please submit a pull request.
Cheers.
I had the same issue yesterday. The comment about sending a :delimiter to find_all in objects_extension.rb implies that is the way to do it, but the code doesn't parse the returned data correctly.
I ended up adding the following code locally to gain this functionality:
module S3
class Bucket
def directory_list(options = {})
options = {:delimiter => "/"}.merge(options)
response = bucket_request(:get, :params => options)
parse_directory_list_result(response.body)
end
def parse_directory_list_result(xml)
names = []
rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") { |e| names << e.text }
names
end
end
end
And then just call it with
bucket.directory_list :prefix => "foo/bar/baz/"
Sorry its not a full blown pull request, but I don't have the time to make one right now. You are, of course, welcome to do whatever you want with this. I'd suggest also removing the documentation about :delimiter from objects_extension.rb, since its a bit of a red-herring.
Thanks for writing this module in the first place, -Chris
Chouck, can you try to add some tests, and create a pull request for that?
I'm not really sure if that patch really solves the issue I was experiencing. I want to list just the top-level directories within a given bucket, then with each one of those, this patch becomes effective since you can take a given directory/sub-bucket and pass it into the directory_list method.
UPDATE: This works wonderfully. Sorry for the preemptive comment.
(sorry for spam)
To get objects within a given subdirectory this patch does not totally solve the problem, you still have to iterate over the entire collection and select just the objects you care about. ala ...
all_objects_in_my_bucket = s3_service.buckets.find('some bucket').objects
objects_grouped_by_sub_dir = s3_service.buckets.find('some bucket').directory_list(:prefix => 'some directory with many sub directories').inject({}) { |memo, dir|
memo[dir] = all_objects_in_my_bucket.select { |o| o.key.include?(dir) }
memo
}
There's got to be a better way.
True, but I think you and I are trying to solve different problems.
I have a huge tree of data and I wanted a list of sub-directories 3 layers down (which are dynamically generated, so I don't have a fixed list). I don't want all of the files in each of those sub-directories, in fact, I only want one or two out of the thousands that are in each sub-tree.
If I'm understanding what you are saying, it sounds like you want something more like:
my_bucket = s3_service.buckets.find('some bucket') prefix_list = my_bucket.directory_list(:prefix => 'some directory with many sub directories') prefix_list.each { |prefix| objects_grouped_by_sub_dir[prefix] = my_bucket.objects.find_all(:prefix => prefix)}
Yes! Guess I should've dug in on the source some more. Thanks.
Sorry to resurrect an old thread but I needed this feature very much, any progress with this or a linked PR? I wouldn't mind creating it!
Also I expanded on what @chouck created:
module S3
class Bucket
# this method recurses if the response coming back
# from S3 includes a truncation flag (IsTruncated == 'true')
# then parses the combined response(s) XML body
# for CommonPrefixes/Prefix AKA directories
def directory_list(options = {}, responses = [])
options = {:delimiter => "/"}.merge(options)
response = bucket_request(:get, :params => options)
if is_truncated?(response.body)
directory_list(options.merge({:marker => next_marker(response.body)}), responses << response.body)
else
parse_xml_array(responses + [response.body], options)
end
end
private
def parse_xml_array(xml_array, options = {}, clean_path = true)
names = []
xml_array.each do |xml|
rexml_document(xml).elements.each("ListBucketResult/CommonPrefixes/Prefix") do |e|
if clean_path
names << e.text.gsub((options[:prefix] || ''), '').gsub((options[:delimiter] || ''), '')
else
names << e.text
end
end
end
names
end
def next_marker(xml)
marker = nil
rexml_document(xml).elements.each("ListBucketResult/NextMarker") {|e| marker ||= e.text }
if marker.nil?
raise StandardError
else
marker
end
end
def is_truncated?(xml)
is_truncated = nil
rexml_document(xml).elements.each("ListBucketResult/IsTruncated") {|e| is_truncated ||= e.text }
is_truncated == 'true'
end
end
end
This handles listing out directories when you run into a key limit (due to the S3 API MaxKeys hard limit of 1000 keys). The request will recurse and grab all responses before parsing and returning them. I also added the ability to return "clean directory names" (folder names only) in lieu of returning the entire key/path.