amplify-android
amplify-android copied to clipboard
Customize ds_pk on Delta Sync table created by Amplify
Amplify CLI Version
10.6.2
Question
Currently, we have an Android Amplify app that does Selective Sync from the backend and uses a specific GSI on the base table. So, the base query which uses the base table during the sync is all fine and good.
Now, with respect to delta sync using its Delta Sync table, whenever our app users are most active, this Delta Sync table is just getting throttled despite being an on-demand table as it is designed to use only two keys (for instance, keys are foo-table-2023-07-27 and bar-table-2023-07-27). Keeping the throttling issue aside, the RCU cost incurred for this delta sync table was always huge when the TTL was 27 hours and consumed 99% of our Dynamo DB cost. We reverted the TTL back to 30 min which again triggered the bug which is again cost prohibitive and we worked around that bug. This means we are back to doing a lot of base queries.
That begs the question, Is there a way we can customize the default pk and sk created by Amplify for this Delta Sync table that aligns with our access pattern related to Selective Sync
- i.e Users always sync ONLY their data using the base query,
- and also sync ONLY their data from lastSync using the delta query as well?
Hey @naveenkoduri , 👋 thanks for raising this! I'm going to transfer this over to our Android repository for better assistance 🙂.
@naveenkoduri - can you please provide the following?
- Your project's schema (found under
/amplify/backend/api/[api name]/schema.graphql
- Current sync expressions
- Any relevant DataStore code snippets that demonstrate how you are querying and subscribing to updates
Thank you!
@naveenkoduri - there is currently a PR out that changes the delta table partition key format for better sync performance with models having custom primary keys. New attributes are added to mutations and sync query resolvers to notify AppSync to use the newly improved data format.
Once this PR has been merged, the delta sync performance will be improved to better utilize the custom primary keys that you are using.
@david-mcafee, the PR you mentioned seem to be applicable for models having custom primary key. Just so you are aware our model does not have a custom primary key as you can see in the schema we are not using @primaryKey directive. Below is the information you asked for.
Test code
Project Schema:
type TextMessage @model @auth(rules: [
{allow: owner, ownerField: "offId", provider: oidc, identityClaim: "offId"},
{allow: owner, ownerField: "customerId", provider: oidc, identityClaim: "customerId"}]) {
customerId: String!
offId: String
contactId: String!
inId: String @index(name: "byInId", sortKeyFields: ["createdAt"], queryField: "messagesByInId")
content: String
subject: String
contactFirstNm: String!
contactLastNm: String!
inFirstNm: String!
inLastNm: String!
createdAt: AWSDateTime
}
Sync Expression
DataStoreSyncExpression textMessageDataSyncExpression = () -> TextMessage.IN_ID.eq(inId);
DataStoreConfiguration datastoreBuilder = DataStoreConfiguration.builder()
.syncExpression(TextMessage.class,textMessageDataSyncExpression)
.syncPageSize(1000)
.syncMaxRecords(25000)
.build();.
Datastore
Amplify.DataStore.observe(TextMessage.class,
cancelable -> {
cancelableMessageSubscription = new AtomicReference<>(cancelable);
Log.d("logging");
},
messageReceived -> {
//Code to discard any duplicate events
},
failure -> {
DataStoreException dataStoreException = failure;
Log.e("logging");
cancelSubscription();
resume();
},
() -> Log.i("Observation complete for TextMessages.");
Cloud Formation generated for delta table with latest Amplify CLI 12.2.5 As you could see below, there is no GSI created by CLI for the delta table, and hence, for the Delta query irrespective of how Amplify builds the request, there is no way to query records by inId while querying the Dynamodb unless you are expecting that the data in the PK would be of this format {tablename}-{inId}-{yyy-mm-dd}. Currently, the data is in the format {tablename}-{yyyy-mm-dd}
DataStore:
Type: AWS::DynamoDB::Table
Properties:
KeySchema:
- AttributeName: ds_pk
KeyType: HASH
- AttributeName: ds_sk
KeyType: RANGE
AttributeDefinitions:
- AttributeName: ds_pk
AttributeType: S
- AttributeName: ds_sk
AttributeType: S
BillingMode: PAY_PER_REQUEST
StreamSpecification:
StreamViewType: NEW_AND_OLD_IMAGES
TableName:
Fn::Join:
- ''
- - AmplifyDataStore-
- Fn::GetAtt:
- GraphQLAPI
- ApiId
- '-'
- Ref: env
TimeToLiveSpecification:
AttributeName: _ttl
Enabled: true
UpdateReplacePolicy: Delete
DeletionPolicy: Delete
Also, I just want to reassure you that there is no issue related to syncing data on User's device. If the user has 10 messages, the user's AmplifyDatastore.db always has only 10 messages, and RCU is significantly lower whenever Base query is done. The inefficiency is only with the Delta query where there is no option to query records by a PK expression to begin with unless I am missing something.
I happen to find this in AppSync Sync operation documentation, but, not sure what we have to do on Amplify Schema to have the mentioned "deltaIndex" generated by Amplify.
deltaIndexName
The index used for the Sync operation. This index is required to enable a Sync operation on the whole delta store table when the table uses a custom partition key. The Sync operation will be performed on the GSI (created on gsi_ds_pk and gsi_ds_sk). This field is optional.
Adding sample request received by AppSync and the transformed request to DynamoDB for Base Query and Delta Query. For both of these queries there is not much change in how GraphQL query looks like, but, the transformed request changes
Base Query
AppSync Query:
f0dde0a6-a02b-438a-a8fa-e020caaa4551 GraphQL Query:
query SyncTextMessages($filter: ModelTextMessageFilterInput, $lastSync: AWSTimestamp, $limit: Int) {
syncTextMessages(filter: $filter, lastSync: $lastSync, limit: $limit) {
items {
id
foo
bar
}
nextToken
startedAt
}
}
, Operation: null, Variables: {
"filter": {
"and": [
{
"inId": {
"eq": "foo#1111"
}
}
]
},
"limit": 1000,
"lastSync": 1691202372480
}
TransformedTemplate: Please notice how the transformed request has an index name and the usage of query opposed to filter
{
"version": "2018-05-29",
"operation": "Sync",
"limit": 1000,
"lastSync": 1691202372480,
"query": {
"expression": "#pk = :pk",
"expressionNames": {
"#pk": "inId"
},
"expressionValues": {
":pk": {
"S": "foo#1111"
}
}
},
"scanIndexForward": true,
"filter": {
"expression": "",
"expressionNames": {},
"expressionValues": {}
},
"index": "byInIdIndex"
}
Delta Query
AppSync Query:
397b753e-51ab-4273-8202-965d06125bc6 GraphQL Query: query SyncTextMessages($filter: ModelTextMessageFilterInput, $lastSync: AWSTimestamp, $limit: Int) {
syncTextMessages(filter: $filter, lastSync: $lastSync, limit: $limit) {
items {
id
foo
bar
}
nextToken
startedAt
}
}
, Operation: null, Variables: {
"filter": {
"and": [
{
"inId": {
"eq": "foo#2222"
}
}
]
},
"limit": 1000,
"lastSync": 1691211580443
}
TransformedTemplate: Please notice usage of just the filter
{
"version": "2018-05-29",
"operation": "Sync",
"limit": 1000,
"nextToken": null,
"lastSync": 1691211580443,
"filter": {
"expression": "(#inId = :and_0_inId_eq)",
"expressionNames": {
"#inId": "inId"
},
"expressionValues": {
":and_0_inId_eq": {
"S": "foo#2222"
}
}
}
}
@naveenkoduri - it looks like you tried both a 27 hour TTL for the delta table, as well as a 30 minute TTL. Have you tried experimenting with a TTL configuration in between those values? If not, I would recommend starting with a 2 hour TTL to see if that helps.
I also wanted to follow up on your other comments:
- The transformed templates that you include look correct (i.e. there isn't a bug).
- You may already be aware, but I also wanted to point out that since you are using the
@index
directive on theinId
field, and your sync expression is set to filter on that value, you are performing a query instead of a scan when performing a base sync (meaning highly efficient and cost-effective data retrieval). - Regarding your questions on Delta sync: DataStore queries using the table name on the partition key (and date and time with the sort key), and then applies the filter. That’s a more costly operation compared to querying the base table if there are too many records added to one specific table in a short time. However, if you updated your schema to use a custom primary key, there would be a huge performance improvement once the PR I linked above is implemented.
If updating the TTL does not help and / or custom primary keys are not an option for you, please let me know! Thanks!
@david-mcafee, Thanks for looking further into it. Currently, we have 30 min for one table and 5 min for another table. With these TTL settings itself, we are see Datastore/DeltaSync table consuming more RCU compared to the Base table for the traffic we have. The one option we know will make it better is going further down to 5min on the other table as no. of records in DeltaSync table will decrease and thereby fewer records to query for Delta Query. However, this would pose another problem in the future as we grow where we will see a lot of Sync requests going to the Base table that will return up to 25000 records.
Regarding the primaryKey, we do need it to be auto-generated UUID as the PK identifies a specific text message between two parties.