hurl icon indicating copy to clipboard operation
hurl copied to clipboard

Automatic pagination that continues until exhaustion

Open joshuaclayton opened this issue 1 year ago • 6 comments

Problem to solve

Chained requests require explicit declaration, which makes pagination through an unknown page size untenable.

POST https://URL
Content-Type: application/json
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

POST {{next_page}}
[BasicAuth]
user: pass
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
next_page: header "Link" regex "<([^>]+)>; rel=\"next\""

# keep repeating somehow - programmatically generate hurl files? Continue with copy/paste?

Proposal

The simplest example would be to wholesale swap URLs based on presence of a capture without any additional modification. This would allow for simple asserts / body capture decoupled from raw values and instead based on structure (e.g. presence of a field in a JSON response). Asserting against raw values likely wouldn't make sense for anything dynamic given generic pagination.

In that case, an additional section might work:

[PaginatesVia]
url: header "Link" regex "<([^>]+)>; rel=\"next\""

Other approaches might include more specific data capture (e.g. parsing page=5 from the Link header for the correct page, or querying the JSON response if that's where pagination info sits).

Additional context and resources

Specific use case: data extraction (rather than response assertion) against paginated resources of unknown size.

I'd looked to see if there was any functionality around looping within the grammar and didn't find anything, and while I understand it may be possible to use JSON output + shell + jq or similar to initiate chaining, in an ideal world there'd be a mechanism for this within the grammar itself.

joshuaclayton avatar Jun 11 '24 11:06 joshuaclayton

Thanks @joshuaclayton for your issue. Automatic pagination is really an interesting/challenging use case. It would be nice if it could fit in a more general looping mechanism not specific to pagination. We have already skip, we might also add a repeat with a specifc repetition or a termination condition (similar to retry).

We need plenty of examples to see how it could work.

fabricereix avatar Jun 11 '24 13:06 fabricereix

With --skip and --repeat, one can image such a file:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\""

We initiate the variable url with initial value, play the request if this variable is not null, update the variable url and repeat. The thing that is missing is when the capture for the variable url is failing, Hurl considers it as an error whereas we want to continue the run. We could imagine in this case to give a default value to the capture if it is failing url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" default null

In summary, we could use repeat and skipwithout too much syntax changes:

  • accept a predicate in skip
  • find a way to make "faillible" capture: with a default value for instance

jcamiel avatar Jun 12 '24 09:06 jcamiel

Another, better, syntax for default could be else:

POST {{url}}
[Options]
repeat: -1 # infinite loop
skip: {{url}} isNull
{
  "payload": "body"
}
HTTP/2 200
[Captures]
body: body
url: header "Link" regex "<([^>]+)>; rel=\"next\"" else null

jcamiel avatar Jun 13 '24 09:06 jcamiel

One possible solution is to use repeat feature, which has been developed by @jcamiel and will be available in the next release.

For example, using Gitlab api to retrieve tags list from a repo, all we have to do is creating pagination.hurl :

  • Make a first request section to get total pages :
GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&per_page={{per_page}}&page=1
Content-Type: application/json

HTTP 200

[Captures]
total_pages: header "X-Total-Pages" toInt
  • Then iterate wit repeat catching next page from each response :
GET {{gitlab_api_url}}/projects/{{gitlab_project_id}}/repository/tags?private_token={{gitlab_token}}&sort=desc&order_by=version&per_page={{per_page}}&page={{next_page}}
Content-Type: application/json
[Options]
repeat: {{total_pages}}

HTTP 200

[Captures]
next_page: header "X-Next-Page"
  • And simply exec hurl and set init vars:
$ hurl \
    --variable gitlab_api_url=https://gitlab.com/api/v4 \
    --variable gitlab_project_id=1 \
    --variable gitlab_token=***** \
    --variable per_page=1 \
    --variable next_page=1 \
    pagination.hurl

lepapareil avatar Jun 17 '24 09:06 lepapareil

@lepapareil it's been quite a while since I've commented, but I wanted to give heartfelt thanks to your example for GitHub!

One additional data point (as I'm using Hurl more) is a case where the URL allows for pagination (specifically, limit and offset query params) but the response includes no additional information whatsoever about page size or record count. This obviously presents an even more challenging situation.

repeat could potentially get us very close if it allowed for a non-number definition (like running until a jsonpath statement evaluates to true), in combination with two other pathways:

  1. allow for "mutable state", e.g. incrementing a variable (e.g. offset) by a value and allowing repeat to go until some condition is met, or
  2. making repeat_count a variable that can be threaded through, with a multiplication operation, to calculate offset

Threading repeat_count through to the variables looks to be easy enough; within runner/hurl_file.rs, at the top of the loop adding:

        variables.insert(
            "repeat_count".to_string(),
            crate::runner::Value::Number(crate::runner::Number::from(repeat_count as i64)),
        );

would suffice.

An example of "mutating" a value could look something like:

GET {{url}}?offset={{offset}}&per_page={{per_page}}
Authorization: Bearer {{api_key}}
Accept: application/json

HTTP/1.1 200
[Captures]
offset: jsonpath "$.messages" count

GET {{url}}?offset={{offset}}&per_page={{per_page}}
Authorization: Bearer {{api_key}}
Accept: application/json
[Options]
repeat: jsonpath "$.messages" count > 0

HTTP 200
[Captures]
offset: variable "offset" + jsonpath "$.messages" count

Mathematical operations, even "simple" ones like what I mentioned above, add a layer of complexity that I understand may not be in line with Hurl core values; at the same time, certain APIs don't always afford ideal metadata like pagination that may have viable workarounds with Hurl. Curious to hear your thoughts!

joshuaclayton avatar May 07 '25 12:05 joshuaclayton

Hi @jcamiel

The only limitation I see is the advanced pagination. Having an internal variable “repear_index” and being able to do arithmetic operations to assign and increment a variable would give it a lot of power. @joshuaclayton aims well

Thank you for the tool , it´s amazing!

RaulBSC avatar May 23 '25 19:05 RaulBSC