VirtualiZarr icon indicating copy to clipboard operation
VirtualiZarr copied to clipboard

Get a list of all the files referenced by a ManifestStore

Open maxrjones opened this issue 4 months ago • 1 comments

This would be useful for people opening up Kerchunk references. People could use the following code now, but it would be best to provide a utility function since the _group attribute is private.

def custom_function(path):
    parsed = urlparse(path)
    return f"{parsed.scheme}://{parsed.netloc}"

vectorized_func = np.vectorize(custom_function)
group = manifest_store._group
buckets = []
for array in group.arrays.values():
    buckets.append(np.unique(vectorized_func(array.manifest._paths)))
# TODO: Add recursive iteration for sub-groups
np.unique(np.concat(buckets))

maxrjones avatar Aug 26 '25 21:08 maxrjones

This would be useful on virtual datasets too.

TomNicholas avatar Aug 27 '25 01:08 TomNicholas