task-execution-schemas
task-execution-schemas copied to clipboard
Pagination support
The current implementation of the API
responses lacks adequate support for comprehensive pagination. The API solely offers a next_page_token
as a cursor to navigate to the next page. However, to access the previous page, it necessitates client-side
caching of the cursors. Additionally, essential pagination elements such as the total number of tasks and pages, are crucial for effective pagination.
Strongly agree with this. The current pagination is really only useful for limiting the number of tasks returned by GET /tasks
. For effective pagination in a GUI client it's not very helpful.
Perhaps we could look at the GA4GH pagination style guide that is currently being drafted up by TASC in repsonse to this issue: https://github.com/ga4gh/TASC/issues/29). It lists a couple of solutions and discusses pros and cons.
It'd be nice to see pagination where user can:
- Go forward and backward
- Jump to pages
Maybe setting up an extra cursor page_token_prev
and page_num
could be a good place to start. Also the API should allow users to fetch a specific page in that manner, so that user can just lets say jump to page 4 skipping all the middle pages. Apart from that total_results
and total_page_number
would also be essential, which would allow the user to directly jump to the last page.
@andrewyatz: Do you think these use cases are covered in the TASC pagination guide?
Pagination guide would suggest not using tokens for exactly these reasons and instead paging using counts/offsets. Tokens have been written up to prefer streaming & not that you cannot replay paginated streams (like in Kafka) it is implementation specific.
I'd like to know if this is a "good" solution to the above
Thanks @andrewyatz.
@JaeAeich: Would either of the two solutions described in the GA4GH pagination guide draft address your use case? If not, please leave feedback :pray:
When datasets may change between requests, clients SHOULD be tolerant of skipped or redundant entries between pages and servers SHOULD consider pagination stability requirements when choosing to offer an page-based pagination scheme.
How would client be tolerant to the data its unaware of, ie, lets say I fetched page n, and while fetching page n + 1, the data set changes and an entry gets added to page n, this means I'll have a redundant entry which my client has already seen in page n. We can have a workaround by implementing some kind of cache invalidation, but I think this should be the job of the server.
Page based
total (optional) - The total number of results available in the result set
Client typically show the range of "pages" a specific user will have, ie if I request 5 items per page, I will expect 100/5 = 20 (say total 100) "pages", ie when on page n, pagination will show something like 1,2 ... ->n<- .. 19, 20
. To get this I think in this approach client would want to have total items. This way client can calculate offset and properly paginate.
Servers MAY return less than the page_size, even if it is not the last page
I understand this might not be a problem for GUI, as result page_size
would never exceed 100, but just outta curiosity... why not a wait till a threshold and error if its too resource intensive?
Token based
Response doesn't have prev_page_token
, client would need or will have to cache these token, which shouldn't be its job.
Use case visualization
PS: I think page_based would be preferred but would require breaking changes in cloud-component.
Thanks for the insights here. It might be better to move some of these to the TASC repo but from the first points here
Data tolerance This is a key issue in any pagination scheme. When researching this I found many technologies attempt to use temporal pagination or paginate through a cached result server side to avoid the issue all together. I think the impact of this depends on how much you can or cannot tolerate repeats in data responses and that does not feel like the pagination's guide to say this
Pages available I didn't see anything in the examples we reviewed which did provide the page count and as you said it's typically calculated by the client. So we didn't include it
prev_page_token
The availability of that token would depend on the state the server knows. It might have zero idea about what the previous page was, which is why it was omitted from the recommended returns. If the token was predictable then this is a different matter but we pushed for obfuscated tokens (which of course could be based on known data).
Do you think this is an issue?
I agree that the first point should be left to implementers and that the second point is unusual and can easily be calculated by the client from the total number of records and the page size.
I also agree that a token for the previous page may or may not be useful, sensible, possible for a given recommendation and so should not be recommneded as required. It could, however, be mentioned in a note or as an optional extension (it would be cool that all the implementations that make use of it use it in the same manner).
This is great thank you. I've added these suggestions. Please see what you think especially as your comments are with respect to promoting consistency. Really appreciate this