azure-sdk-for-python
azure-sdk-for-python copied to clipboard
Azure storage: list_blob_names is slow
- Package Name: azure-storage-blob
- Package Version: 12.14.1
- Operating System: Windows
- Python Version: Python 3.8
Describe the bug I have 30k files in the azure storage account in below structure and when I do list_blob_names it takes 15+ mins.
parquet
|__ phone
|__ name=iphone5
|__ iphone.parquet
|__ name=iphone5s
|__ iphone5s.parquet
|__ name=iphone6
|__ iphone6.parquet
To Reproduce Steps to reproduce the behavior:
- Create 30k files like above
- Execute below code
from azure.storage.blob import BlobServiceClient
service = BlobServiceClient(account_url="https://my.blob.core.windows.net/", credential=credential)
c = service.get_container_client("parquet")
paths = [x for x in c.list_blob_names(name_starts_with='phone/name=') if x.endswith("parquet")]
Expected behavior I expected to get the list of names in milli-seconds
Screenshots If applicable, add screenshots to help explain your problem.
Additional context Add any other context about the problem here.
Hi @selvavm - Thanks for opening an issue! Tagging the right people to take a look asap!
Hi @selvavm, thanks for the report. This does seem much slower than expected for listing 30k blobs. Once thing I do want to mention is that list_blob_names is a client-side convenience method to speed up client-side processing of a List Blobs response. That method will still download all data from the service and therefore is not faster in terms of networking.
I have a couple of questions to help narrow down what could be causing this:
- Do you have hierarchical namespace (HNS) enabled on your Storage Account?
- How many blobs are in the container total? You mentioned 30k but is this the total number of blobs or just the number of results your query returns?
- Do you have blob soft-delete or blob versioning on your account? If so, are there a lot of soft-deleted blobs or old blob versions in the container?
Is the prefix matching slowing it down? I also see very poor performance when using either of the methods to list blobs with a prefix.
Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!