VirtualiZarr
VirtualiZarr copied to clipboard
Get a list of all the files referenced by a ManifestStore
This would be useful for people opening up Kerchunk references. People could use the following code now, but it would be best to provide a utility function since the _group attribute is private.
def custom_function(path):
parsed = urlparse(path)
return f"{parsed.scheme}://{parsed.netloc}"
vectorized_func = np.vectorize(custom_function)
group = manifest_store._group
buckets = []
for array in group.arrays.values():
buckets.append(np.unique(vectorized_func(array.manifest._paths)))
# TODO: Add recursive iteration for sub-groups
np.unique(np.concat(buckets))
This would be useful on virtual datasets too.