azure-docs
azure-docs copied to clipboard
Inconsistencies in definition of a table scan
There is some inconsistency around table scans.
Quote 1, from here:
A Table Scan does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities. It will perform a table scan regardless of whether or not your filter uses the RowKey. For example: $filter=LastName eq 'Jones'
Table 1, from here:
My questions: • What is the difference between the terms “table scan” (Quote 1) and “full table scan” (Table 1), if any? I could not find this anywhere. • Quote 1 says that “A Table Scan does not include the PartitionKey” yet Table 1 says that a partial PartitionKey match is possible for a “full table scan.” Which is correct?
Document Details
⚠ Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.
- ID: f7b96d3c-956f-8757-3479-ebc7a7186ab8
- Version Independent ID: 0a584aeb-09cd-549f-0ee3-1a61433b5512
- Content: Design Azure Table storage for queries
- Content Source: articles/storage/tables/table-storage-design-for-query.md
- Service: azure-table-storage
- GitHub Login: @akashdubey-ms
- Microsoft Alias: akashdubey
@sharajmsft Thanks for your feedback! We will investigate and update as appropriate.
@sharajmsft Regarding your first question, "table scan" and "full table scan" are essentially the same thing. They both refer to a query that searches through all of the partitions in a table to find matching entities. The difference in terminology is likely due to the fact that different sources may use slightly different terminology to describe the same concept.
Regarding your second question, both statements are correct, but they are referring to slightly different scenarios. Quote 1 is referring to a scenario where you are searching for entities based on a filter that does not include the PartitionKey. In this case, a Table Scan will be performed, and it will search through all of the partitions in the table to find matching entities.
Table 1 is referring to a scenario where you are searching for entities based on a filter that includes a partial PartitionKey match. In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey. This is more efficient than a Table Scan, but it is still less efficient than a query that includes both the PartitionKey and RowKey.
Thanks for the followup @PesalaPavan . I am still confused by your explanation.
You said:
In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey.
But you also said:
"table scan" and "full table scan"...both refer to a query that searches through all of the partitions in a table to find matching entities
- Does this term refer to a query that searches through only some of a table's partitions, all of a table's partitions or either?
Regarding:
a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey. This is more efficient than a Table Scan...
But you also said:
"table scan" and "full table scan" are essentially the same thing
- So how can a full table scan be more efficient than a table scan? Didn't you just say that these are equivalent?
@sharajmsft I apologize for the confusion in my previous response. A full table scan is not more efficient than a table scan. In fact, a full table scan is generally less efficient than an index scan or a partition scan because it requires reading all of the data in the table, regardless of whether or not it matches the query criteria. A "full table scan" and a "table scan" are essentially the same thing. Both refer to a query that searches through all of the partitions in a table to find matching entities.
@PesalaPavan I see, thanks for clearing that up. So then my final question is about what you said here:
Table 1 is referring to a scenario where you are searching for entities based on a filter that includes a partial PartitionKey match. In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey
This seems like a partition range scan to me, not a "full table scan" like you said. Am I incorrect? What is the difference?
@sharajmsft You are correct, a partial PartitionKey match would result in a partition range scan, not a full table scan. A partition range scan is a type of query that uses the PartitionKey and filters on a range of RowKey values to return more than one entity. a full table scan searches all of the partitions that make up your table in turn for any matching entities. It does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities.
Thanks for clarifying @PesalaPavan . In that case, I have recommendations for documentation improvements. Should I list them here or just create a PR?
@sharajmsft Thanks for your feedback! I've assigned this issue to the author who will investigate and update as appropriate.
@akashdubey-ms Please check this and update.
@akashdubey-ms I have the following suggestions:
- Do not use both the terms "table scan" and "full table scan"--just choose one or the other, or note somewhere that these are equivalent.
- In this section:
- Change "Second best is a Range Query" to "Second best is a Row Range Query" to be more specific and consistent.
- Change "Partition scan" to "Partition range scan" to be more specific and consistent.
- Change "A Table Scan does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities." to "Fourth best is a Table Scan that does not include the PartitionKey, which is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities." The new phrasing accommodates the fact that even a query that includes a partial partition key might still result in a table scan.
Thanks for your contribution to our documentation
We are moving to a new user feedback system. As part of that effort, we are creating a content request in our backlog to address this gap and are closing this GitHub issue. The updated content will be published shortly. Please continue to provide feedback about the documentation. We appreciate your contributions to our community.
#please-close