Changes

Adding a new method to merge session tokens for customers wanting to keep track of their own session tokens.
Added a new api for converting logical partition key to feed range
Added new api for checking if a feed range is a subset of another feed range

APIs

Container.py def merge_session_tokens(feed_ranges_to_session_tokens: List, target_feed_range: str): --> str def feed_range_for_logical_partition(pk: PartitionKey): --> str def is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> str

Samples

# This would be happening through different clients 
feed_ranges_and_session_tokens = []
for doc in docs_to_create:
    container.create_item(doc)
    # the feed range returned in the request context will correspond to the logical partition key 
    feed_range = container.client_connection.last_response_headers["request-context"]["feed-range"]
    session_token = container.client_connection.last_response_headers["request-context"]["session-token"]
    feed_ranges_and_session_tokens.append((feed_range, session_token))

# Note that the list of feed ranges and session tokens here would be aggregated from different clients 
# for these examples
# All of these are getting the most updated session token for a target_feed_range

# ---------------------1. using logical partition key ---------------------------------------------------
# could also use the one stored from the responses headers
for logical_pk in logical_pks:
    target_feed_range = container.feed_range_for_logical_partition(logical_pk)
    updated_session_token = container.merge(feed_ranges_and_session_tokens, target_feed_range)

# ---------------------2. using arbitrary feed range ----------------------------------------------------
container_feed_ranges = container.read_feed_ranges()

for target_feed_range in target_feed_ranges:
    updated_session_token = container.merge(feed_ranges_and_session_tokens, target_feed_range)

# ---------------------3. using physical partitions -----------------------------------------------------
target_feed_range = container.feed_range_for_logical_partition(logical_pk)

updated_session_token = container.merge(feed_ranges_and_session_tokens, target_feed_range)
# ------------------------------------------------------------------------------------------------------

Implementation

Glossary

Session Token Format: PKRangeId:VersionNumber#GlobalLSN#RegionId1=LocalLSN1#RegionId2=LocalLSN2... Compound session token: Comma separated session tokens

Aug 21 '24 05:08 tvaron3

API change check

APIView has identified API level changes in this PR and created following API reviews.

azure-cosmos

Aug 21 '24 06:08 azure-sdk

Of the above operations, can you flag which ones involve reads/writes to the actual container vs are local? We are curious on the metadata read properties (and latency) of the example flows.

With respect to the artificial feed ranges case, from the API is appears that is a client-only change? Why is it out of scope?

Sep 14 '24 15:09 nickcoai

Of the above operations, can you flag which ones involve reads/writes to the actual container vs are local? We are curious on the metadata read properties (and latency) of the example flows.

With respect to the artificial feed ranges case, from the API is appears that is a client-only change? Why is it out of scope?

@nickcoai The following are the operations and whether they require metadata calls. None of them will do a metadata call on each invocation as the relevant info would be cached. The latency for these flows should be low as most of time will not require metadata call.

def get_updated_session_token(feed_ranges_to_session_tokens: List, target_feed_range: str): --> str - Requires no metadata calls def feed_range_for_logical_partition(pk: PartitionKey): --> FeedRange - There could be metadata calls for the collection properties, but it is cached def is_feed_range_subset(parent_feed_range: str, child_feed_range: str): --> bool - Currently, no metadata calls are necessary for this, but the feed range implementation is being worked on in parallel so this could change. Will update pr accordingly. def read_feed_ranges(num_of_ranges: int): --> List - This would require metadata calls sometimes because it requires the pkrange cache.

For the artificial feed ranges case, the artificial feed ranges can easily have negative side effects (not necessarily for session token merge, but when using them as scoping filter for query/change feed) when the service is not being able to effectively apply them. So, minimizing surface area also minimizes risks of sending down customers a route that results in issues later. This is why we left it out of scope for this pr.

Sep 17 '24 02:09 tvaron3

/azp run python - cosmos - tests

Oct 08 '24 01:10 tvaron3