azure-docs icon indicating copy to clipboard operation
azure-docs copied to clipboard

Inconsistencies in definition of a table scan

Open sharajmsft opened this issue 10 months ago • 9 comments

There is some inconsistency around table scans.

Quote 1, from here:

A Table Scan does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities. It will perform a table scan regardless of whether or not your filter uses the RowKey. For example: $filter=LastName eq 'Jones'

Table 1, from here: image

My questions: • What is the difference between the terms “table scan” (Quote 1) and “full table scan” (Table 1), if any? I could not find this anywhere. • Quote 1 says that “A Table Scan does not include the PartitionKey” yet Table 1 says that a partial PartitionKey match is possible for a “full table scan.” Which is correct?


Document Details

Do not edit this section. It is required for learn.microsoft.com ➟ GitHub issue linking.

sharajmsft avatar Apr 22 '24 23:04 sharajmsft

@sharajmsft Thanks for your feedback! We will investigate and update as appropriate.

TPavanBalaji avatar Apr 23 '24 04:04 TPavanBalaji

@sharajmsft Regarding your first question, "table scan" and "full table scan" are essentially the same thing. They both refer to a query that searches through all of the partitions in a table to find matching entities. The difference in terminology is likely due to the fact that different sources may use slightly different terminology to describe the same concept.

Regarding your second question, both statements are correct, but they are referring to slightly different scenarios. Quote 1 is referring to a scenario where you are searching for entities based on a filter that does not include the PartitionKey. In this case, a Table Scan will be performed, and it will search through all of the partitions in the table to find matching entities.

Table 1 is referring to a scenario where you are searching for entities based on a filter that includes a partial PartitionKey match. In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey. This is more efficient than a Table Scan, but it is still less efficient than a query that includes both the PartitionKey and RowKey.

PesalaPavan avatar Apr 24 '24 06:04 PesalaPavan

Thanks for the followup @PesalaPavan . I am still confused by your explanation.

You said:

In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey.

But you also said:

"table scan" and "full table scan"...both refer to a query that searches through all of the partitions in a table to find matching entities

  1. Does this term refer to a query that searches through only some of a table's partitions, all of a table's partitions or either?

Regarding:

a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey. This is more efficient than a Table Scan...

But you also said:

"table scan" and "full table scan" are essentially the same thing

  1. So how can a full table scan be more efficient than a table scan? Didn't you just say that these are equivalent?

sharajmsft avatar Apr 24 '24 20:04 sharajmsft

@sharajmsft I apologize for the confusion in my previous response. A full table scan is not more efficient than a table scan. In fact, a full table scan is generally less efficient than an index scan or a partition scan because it requires reading all of the data in the table, regardless of whether or not it matches the query criteria. A "full table scan" and a "table scan" are essentially the same thing. Both refer to a query that searches through all of the partitions in a table to find matching entities.

PesalaPavan avatar Apr 25 '24 09:04 PesalaPavan

@PesalaPavan I see, thanks for clearing that up. So then my final question is about what you said here:

Table 1 is referring to a scenario where you are searching for entities based on a filter that includes a partial PartitionKey match. In this case, a full table scan may be performed, but it will only search through the partitions that match the partial PartitionKey

This seems like a partition range scan to me, not a "full table scan" like you said. Am I incorrect? What is the difference?

sharajmsft avatar Apr 25 '24 16:04 sharajmsft

@sharajmsft You are correct, a partial PartitionKey match would result in a partition range scan, not a full table scan. A partition range scan is a type of query that uses the PartitionKey and filters on a range of RowKey values to return more than one entity. a full table scan searches all of the partitions that make up your table in turn for any matching entities. It does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities.

PesalaPavan avatar Apr 26 '24 05:04 PesalaPavan

Thanks for clarifying @PesalaPavan . In that case, I have recommendations for documentation improvements. Should I list them here or just create a PR?

sharajmsft avatar Apr 26 '24 17:04 sharajmsft

@sharajmsft Thanks for your feedback! I've assigned this issue to the author who will investigate and update as appropriate.

@akashdubey-ms Please check this and update.

PesalaPavan avatar Apr 29 '24 05:04 PesalaPavan

@akashdubey-ms I have the following suggestions:

  • Do not use both the terms "table scan" and "full table scan"--just choose one or the other, or note somewhere that these are equivalent.
  • In this section:
    • Change "Second best is a Range Query" to "Second best is a Row Range Query" to be more specific and consistent.
    • Change "Partition scan" to "Partition range scan" to be more specific and consistent.
    • Change "A Table Scan does not include the PartitionKey and is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities." to "Fourth best is a Table Scan that does not include the PartitionKey, which is very inefficient because it searches all of the partitions that make up your table in turn for any matching entities." The new phrasing accommodates the fact that even a query that includes a partial partition key might still result in a table scan.

sharajmsft avatar Apr 29 '24 15:04 sharajmsft

Thanks for your contribution to our documentation

We are moving to a new user feedback system. As part of that effort, we are creating a content request in our backlog to address this gap and are closing this GitHub issue. The updated content will be published shortly.   Please continue to provide feedback about the documentation. We appreciate your contributions to our community.

#please-close

akashdubey-ms avatar Jul 28 '24 22:07 akashdubey-ms