amplify-category-api Total counts in GraphQL queries

Is your feature request related to a problem? Please describe. I think it's way past due that Amplify supports total counts in GraphQL queries. Other GraphQL based platforms has this built in by simply adding totalCount (or similar) in the query and no matter the limit, they'll get back the total count in addition to the (filtered) data.

Describe the solution you'd like This should at least work for DynamoDB backed models and of course also for search based queries that pass by ElasticSearch.

Describe alternatives you've considered Making a Lambda function that is a field in each model using the @function directive, but since we are both using listItems and searchItems with filters added, the implementation is not simple as we have to reapply those filters on the lambda function for getting the correct count.

Making custom resolvers seems like another not very fun route and not very scaleable or maintainable, and once again, this should be a "out of the box one liner" to have available as a developer. With either a Lambda or some other custom resolver I'm looking at hours or days of development.

Additional context This is a must have feature and there's not really any workaround for displaying total counts for systems with many items — at least that I know of. I read several bug reports, but none of them seems to have a simple solution. That it has not yet been developer by AWS is beyond my understanding, as pulling out counts is one of the most common things to do when developing web apps.

Jul 19 '19 01:07 houmark

There is a workaround for this and a reason why it is not automatically included as part of amplify - DynamoDB does not have an efficient way to count items, and can't in general perform aggregate operations. If you want to count items, write a custom lambda to read from the dynamo table stream and conditionally increment or decrement a counter on the table. Add a lambda resolver to read this counter value from the table, and add it to your schema.

Because dynamo can't perform aggregate operations, and you potentially want to get item counts after a filter expression is applied, having the cli provide this feature is a tall order.

Jul 21 '19 20:07 RossWilliams

Yeah, I aware of DynamoDB limitations, but asking the developer to develop correct counts is also a tall order. As I can have any combination of filters, I would need to count based on that and that seems like a huge implementation.

A count of items is something most apps use, so while I may be able to develop something that works decently, this feature request really is about either DynamoDB fixing this or Amplify creating some kind of sensible workaround that does not put all the workload on the developer.

Thanks for your feedback, the things you outline has been in my thoughts in relation to a workaround on this, but trying to come up with a sensible workaround that respects any type of filter on any table seems like something I could spend a week or two developing and may still be prone to wrong counts in edge situations.

Maybe ElasticSearch has counts of total results, even if there's a limit applied and this could be included in results as a decent workaround? We use ElasticSearch for most queries, as the default filters are too limited anyways.

Jul 21 '19 20:07 houmark

Elastic does provide this information, you can access it in the resolver template with: $ctx.result.hits.total.

There is a difficult disconnect between graphql and dynamodb development mindset. Dynamo DB requires planning of most access patterns in the design phase, while graphql makes an iterative approach seem straightforward. While amplify backed by dynamodb does offer some flexibility, it does require more ahead of time planning than other platforms, in this case with aggregates. If you know your aggregates now, and know they are stable, development of lambdas is doable. Solving the general case sounds much more difficult as long as your data is backed by dynamodb.

I agree that it would be helpful to have templates to speed up creation of lambda aggregate creators (including backfilling data).

Jul 21 '19 21:07 RossWilliams

Elastic does provide this information, you can access it in the resolver template with: $ctx.result.hits.total.

That sounds promising. Any hints on how I could create a @function directive to map to a resolver returning the total hits, without having to overwrite my default resolvers? I have not yet done any custom resolvers or overwritten in this project and I'm trying to avoid that. I have done a few @function directives that run a Lambda for other things (and that works well).

Jul 21 '19 21:07 houmark

Elasticsearch by default returns the total hit count, and by default its accurate up to around 10,000. So if you make a rest api call in your lambda you will get this back as a response. Just update you schema to include it as a property. You will likely need a new type to hold the items, nextToken, and total property.

Jul 21 '19 22:07 RossWilliams

I do think that $ctx.result.hits.total should be exposed by default since it is readily available.

In terms of DynamoDB support, I'm not too sure. I mean it would be nice to have, but if you have a large dataset just getting the count on every request will be expensive. Because every time you go over 1 MB of data, you have to paginate through to get totals. In reality, you could scan through a table made up of millions of records before you can get a final result, which obviously makes no sense even in terms of performance.

Jul 29 '19 07:07 jonmifsud

If we were to build this into Amplify in the future with DynamoDB, could you give a bit more detail on some things:

Is it just total count of all items in the table?
Is it total count for items matching a query? E.G. similar to SELECT COUNT(*) FROM ...
Would you be ok with Amplify attaching Lambda functions to your DynamoDB table, thus increasing your costs?

Any other specifics on your requirements or use cases would help us look into this in the future.

Aug 21 '19 04:08 undefobj

@undefobj

I would say it would have to be the total count of items matching a query. This would be primarily needed for pagination purposes, it's very hard to give a good user experience if we are supposedly showing X results at a time out of Y number of records. This would also need to work with custom indexes, as most likely that's the most efficient setup when listing tables unless we are using searchable.

I wouldn't mind if there are additional lambda functions added to the GraphQL Query, however, I think it would be wise if we are able to choose whether to request/run this or not. If we are paginating across 1000 results with 10 results a time, I don't think we need to necessarily run the total length query 10 times, it could be that the first request is enough. Also, there could be instances where I wouldn't care much about the count and as a result wouldn't want to run the lambda function for no reason. Potentially another way to go around this could be to have a 'cache'/dynamo table store for counts; with rules of how long count results may be valid for.

I hope that makes sense.

Aug 21 '19 06:08 jonmifsud

I agree with @jonmifsud — the main interest for us is to have a count of the filtered query, so we can do a proper pagination and show the total results of that query.

A full count would be nice, but if you do a normal query, shouldn't that result in a count of all items? Also, maybe a way to just return the count would be nice instead of having the count as a part of the result dataset would be nice.

I also would like to be able to choose when to use this and when to not, to avoid extra processes running with load time and costs overhead.

Aug 22 '19 21:08 houmark

You should be able to do pagination using nextToken without needing to know the count. However, what you propose above to have a count on the query results would be difficult. While not impossible, this feature would be quite difficult to do. I'll keep the issue open so that we can get other community feedback and measure interest, but for transparency it would need a lot of design and thought and wouldn't be delivered anytime soon.

Aug 23 '19 03:08 undefobj

Associating the word "impossible" for a query count in 2019 makes me cringe a bit. And more than that, it makes me wonder if selecting Amplify and all its (current) dependents was a very wrong choice.

The fact that DynamoDB does not do counts for its queries (besides a full table estimated count every ~6 hours) is simply a limitation the team working on DynamoDB should solve. Every single competitor to DynamoDB handles this without issues, so I'm sure those smart people can also come up with a solution that does not just benefit AppSync and Amplify users, but also people using DynamoDB directly. Maybe it will be near correct counts if millions and more precise when thousands like MySQL / InnoDB, and that would be way way better than having no clue whatsoever.

I am aware that using the nextToken I can make pagination but that paginator is somewhat less cool to look at from a UX perspective as I won't be able to show 1, 2, 3, 4....12 because I don't know how many pages I have. When someone wants to know how many items to we have fitting this filter, it cannot be that I have to pull them all out (only the id field) in the leanest way, and then count the array client-side?

I'm sure AWS compares themselves in some ways with other GraphQL services like Prisma etc. and they don't seem to have a problem supporting this.

This is a DynamoDB limitation. Attacking a solution on top of that for AppSync is the wrong angle, this needs to end on the table of AWS DynamoDB developers so they can come up with a sensible solution nearer to the root of the problem — everything else is a hack. Asking me to keep counts in a model/table myself when things update is even worse and not what you'd expect of a platform with an otherwise impressive feature set.

And if it's not possible for DynamoDB to solve this, then the Amplify / AppSync team should start considering built-in support for other major database players such as MongoDB, MySQL, Postgres, etc. so they are not being held down by a half-baked database that is backing the entire thing, but when that is considered, I am sure it looks way more interesting to just figure out a solution to counts and other minor limitations DynamoDB currently has.

Aug 23 '19 04:08 houmark

@undefobj What I would like to see to support this type of feature is to utilise kinesis firehose and glue to send data in Redshift or S3 Parquet. Then I could connect to Athena as a serverless analytics system, query Redshift for a non-serverless solution, or have another lambda pick up the objects from S3 and send them into Postgres. Amplify is well placed here as the api category can pick up on schema changes to re-run glue, amplify can make life easier creating a new firehose for each table, and setup the lambdas to put dynamo stream data into firehose. I realise its a big ask, but the question can be generalised into "how can amplify better support analysis workloads"

I see a smaller and faster win on just providing an aggregation template lambda to deal with uncomplicated counters. I would not like aggregation counters enabled by default, as I don't want to pay for what I don't use.

Aug 25 '19 16:08 RossWilliams

@undefobj What I would like to see to support this type of feature is to utilise kinesis firehose and glue to send data in Redshift or S3 Parquet. Then I could connect to Athena as a serverless analytics system, query Redshift for a non-serverless solution, or have another lambda pick up the objects from S3 and send them into Postgres. Amplify is well placed here as the api category can pick up on schema changes to re-run glue, amplify can make life easier creating a new firehose for each table, and setup the lambdas to put dynamo stream data into firehose. I realise its a big ask, but the question can be generalised into "how can amplify better support analysis workloads"

We do support Kinesis Streams in the Analytics category today: https://aws-amplify.github.io/docs/js/analytics#using-amazon-kinesis I have to check on the PR for supporting Firehose but I know it's in progress.

This seems to be an independent ask of the issue in this thread though as this is related to ingest. Total counts in a decoupled systems wouldn't be accurate on the pagination against DynamoDB as you'd run into consistency issues. For the analytics requests I think this would be a good ask in a separate Feature Request issue.

Aug 25 '19 18:08 undefobj

I wasn't as interested in strong consistency or using it for pagination, but the total count calculation and getting aggregations in general. The ask seems to be about bringing other database system capabilities into amplify, which is where ddb stream -> firehose would come in handy as a building block.

Aug 25 '19 20:08 RossWilliams

This got sidetracked by different ideas, but I'd like to know if we can expect total counts for at least Elasticsearch based queries soon?

@undefobj $ctx.result.hits.total is in the resolver response template already, so there should be very little code needed to get this wired in all the way. This would help with a decent workaround for overall total counts for now. It seems doing this against normal list queries against DynamoDB is not anything to expect soon.

Sep 03 '19 00:09 houmark

It seems like aws-amplify/amplify-cli#2600 made aws-amplify/amplify-cli#2602 happen, so here's to hoping that PR gets accepted fast.

Oct 21 '19 02:10 houmark

@houmark isn't that PR related to @searchable attribute? I want to know how many items are in a collection in total.. don't know if that PR solves this issue.

For example, a user has many books in their library, and I want to show on their profile page that how many books are there in their library. I don't want to load all of their books, just to count it. It might be thousands.

Oct 28 '19 22:10 sallar

Yeah, I first thought that PR would solve it due to the original code that was halfway baked in at the time. But that PR changed to just show the total of the returned items which is more or less useless because it's very easy to count the amount of results returned client side.

I don't think the Amplify team is working on a total results value due to the limitations and complications of passing through @auth.

Oct 29 '19 00:10 houmark

@houmark I wouldn't call it "useless" since if you're getting paginated results say 10 results at a time, you might want to get the total hits (if you have more than 10 results) to display information like the number of pages on your UI.

Oct 29 '19 00:10 kaustavghosh06

@kaustavghosh06 But how can I do that? I get 10 when I limit 10 in the query. If I limit 20, total is 20? If I am on the "last page" and limit 20 but there are only 5 results left I get 5, but all of that I can do with result.whatever.items.length client-side.

If I try to pull out 1000 then I hit another limitation which is the result set being too large (don't remember the exact error, but it errors, and if I query a lot but not too much to hit the limit, then it takes, of course, many seconds for result which in a UI leads to racing situations, and I cannot cancel queries due to another limitation, which has a PR but that PR has been stuck now for a good long time). Anyways, If I have 3k items, then it will still give me 1000 in total, so that does not really help me for pagination does it?

Am I missing something here?

Oct 29 '19 00:10 houmark

@houmark Do you have @auth on @searchable?

Oct 29 '19 00:10 kaustavghosh06

I'll speak to @SwaySway about this behavior. I would expect the total hits to be available to the client for pagination purposes.

Oct 29 '19 00:10 kaustavghosh06

That was what I expected also, but this changed in the PR because it would leak data about the total results if @auth is filtering out results due to "access control".

See his comment here: https://github.com/aws-amplify/amplify-cli/pull/2602#issuecomment-543844821

Oct 29 '19 00:10 houmark

For us, first of all we would not have a leak because we don't have advanced access control in our models, so we would get the right amount of results, but even if we did, I would have liked to at least have the flexibility to allow "leaking" and getting the result as a developer. This could come with a warning by Amplify, but as there's no alternative to get totals in any type of way in Amplify (besides maybe writing and maintaining resolvers ourselves which seems somewhat a bad path) it would be nice to have some indicator to be able to show in the UI (for example "around 2.604 results").

Of course, the optimal solution would be that @auth hooked on to the result and added in the total after filtering, but I understand that's a major undertaking and would potentially slow down all queries.

Oct 29 '19 00:10 houmark

I'm flabbergasted. How can a database system have a support for pagination, but give no information regarding the total results? How do you make the pagination UI then? Sure you can show "Next" until there isn't a nextToken anymore or do infinite scrolling, but those don't work for everyone. I almost cant believe it that DynamoDB can't give you a simple count.

Oct 31 '19 08:10 sallar

Too bad i wasn't aware of that limitation when the project had started. Looks like I have to switch over to RDBS like Aurora and do all the hard work of custom resolvers.

Nov 29 '19 08:11 psporysz

@psporysz dynamodb doesn’t prevent this UI pattern, it only requires that you do additional work. You can setup a streams trigger to keep a record count yourself. You can then fetch the count alongside the fetch for data to display a total count.

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Unless there are other significant reasons, DynamoDB can still handle many patterns with a bit more work.

Nov 29 '19 14:11 RossWilliams

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Let's say i have a secondary index for date field and I would like to know the number of items in between some date range. Can you explain how to apply above technique in this case?

Nov 29 '19 15:11 psporysz

@psporysz dynamodb doesn’t prevent this UI pattern, it only requires that you do additional work. You can setup a streams trigger to keep a record count yourself. You can then fetch the count alongside the fetch for data to display a total count.

Or, you can sort the items with a monotonically increasing key and have a stream trigger insert a whole number count on the item, this would then allow a skip and limit pattern on a secondary index.

Unless there are other significant reasons, DynamoDB can still handle many patterns with a bit more work.

What happens with 2 simultaneous posts? Is this safe under DynamoDB?

Nov 29 '19 15:11 davidbiller

Can anyone just give an example of how to get the the total hits from elastic search?

Dec 17 '19 13:12 odiaseo

amplify-category-api amplify-category-api copied to clipboard

Total counts in GraphQL queries

amplify-category-api
amplify-category-api copied to clipboard