azure-sdk-for-python icon indicating copy to clipboard operation
azure-sdk-for-python copied to clipboard

Azure storage: list_blob_names is slow

Open selvavm opened this issue 3 years ago • 2 comments

  • Package Name: azure-storage-blob
  • Package Version: 12.14.1
  • Operating System: Windows
  • Python Version: Python 3.8

Describe the bug I have 30k files in the azure storage account in below structure and when I do list_blob_names it takes 15+ mins.

parquet
  |__ phone
    |__ name=iphone5
      |__ iphone.parquet
    |__ name=iphone5s
      |__ iphone5s.parquet
    |__ name=iphone6
      |__ iphone6.parquet

To Reproduce Steps to reproduce the behavior:

  1. Create 30k files like above
  2. Execute below code
from azure.storage.blob import BlobServiceClient
service = BlobServiceClient(account_url="https://my.blob.core.windows.net/", credential=credential)
c = service.get_container_client("parquet")
paths = [x for x in c.list_blob_names(name_starts_with='phone/name=') if x.endswith("parquet")]

Expected behavior I expected to get the list of names in milli-seconds

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

selvavm avatar Oct 28 '22 04:10 selvavm

Hi @selvavm - Thanks for opening an issue! Tagging the right people to take a look asap!

swathipil avatar Oct 28 '22 19:10 swathipil

Hi @selvavm, thanks for the report. This does seem much slower than expected for listing 30k blobs. Once thing I do want to mention is that list_blob_names is a client-side convenience method to speed up client-side processing of a List Blobs response. That method will still download all data from the service and therefore is not faster in terms of networking.

I have a couple of questions to help narrow down what could be causing this:

  • Do you have hierarchical namespace (HNS) enabled on your Storage Account?
  • How many blobs are in the container total? You mentioned 30k but is this the total number of blobs or just the number of results your query returns?
  • Do you have blob soft-delete or blob versioning on your account? If so, are there a lot of soft-deleted blobs or old blob versions in the container?

jalauzon-msft avatar Nov 02 '22 20:11 jalauzon-msft

Is the prefix matching slowing it down? I also see very poor performance when using either of the methods to list blobs with a prefix.

hholst80 avatar Nov 18 '22 21:11 hholst80

Hi, we're sending this friendly reminder because we haven't heard back from you in a while. We need more information about this issue to help address it. Please be sure to give us your input within the next 7 days. If we don't hear back from you within 14 days of this comment the issue will be automatically closed. Thank you!

ghost avatar Nov 29 '22 02:11 ghost