OPTIMADE icon indicating copy to clipboard operation
OPTIMADE copied to clipboard

Should pagination links include all URL query parameters (e.g. response fields)?

Open ml-evs opened this issue 2 years ago • 4 comments

This point was raised on the forums (by @oarcelus) here: https://matsci.org/t/cod-response-does-not-include-response-fields-in-next-link/39530.

Currently the pymatgen OptimadeRester (and presumably other clients) rely on the fact that they can just use the links->next URL directly for pagination, but it seems that some databases (for example, COD in the forum post @merkys) do not append URL parameters like response_fields to the pagination links, which currently breaks clients that expect e.g., COD, to return null for cartesian_site_positions.

I do not think the specification currently specifies exactly how this should behave, hence the confusion.

ml-evs avatar Nov 29 '21 15:11 ml-evs

I am almost sure the response_fields should be retained (I think this is the original intention of the specification, even if not explicitly described). Therefore, this is a bug in the COD OPTIMADE implementation, and I will look into fixing it.

I believe the original intention for the pagination URLs is that they have to contain all the parameters required to get the next page of results without any additional parameters. I will also give the specification a look just to be 100% sure.

merkys avatar Nov 29 '21 16:11 merkys

OK, I gave the specification a look, and it does not seem to be explicit regarding this. Nevertheless, I see only advantages in retaining the original URL parameters: The client does not need to "remember" the request. All its details are included in the URL parameters, thus fetching paginated results is not much more difficult than slurping non-paginated response (bonus: clients are encouraged to paginate).

Moreover, there were discussions (private exchange) about page limits depending on response_fields. In the COD we would like to reduce page limits to 10 when a client requests for cartesian_site_positions, as opposed to page limits of 100 or 1000 when atom positions are not requested. This would play nicely with retained URL parameters, and quite clumsy without them.

The single advantage of placing the onus of URL parameter setting on the client is slight increase of flexibility. By not giving any guarantees about the URL parameters, the specification would push the clients to implement URL string parsing and modification. But clients are already free to do so.

merkys avatar Nov 30 '21 06:11 merkys

This is what the specification says:

next: represents a link to fetch the next set of results. When the current response is the last page of data, this field MUST be either omitted or null-valued.

I agree that it could be more explicit, but IMO the implicit expectation of "the next set of results" is that they contain the same fields as the previous set - otherwise they are not the same results.

Also important: the specification says nothing about that the URL provided in next needs to be interpretable in terms of OPTIMADE (strictly it doesn't even need to point to the same host...), so it would be a bug for a client to assume that it can understand the next link and re-write it to, e.g., add parameters. Hence, if OPTIMADE does not require the next link to give results with the same fields, then there would exist no way for a client to reliably paginate.

rartino avatar Nov 30 '21 09:11 rartino

Fixed the COD implementation today. Now the URL parameters should be propagated.

merkys avatar Jan 03 '22 15:01 merkys

Can we close this issue as answered, or should we clarify the specification accordingly?

merkys avatar Jun 08 '23 09:06 merkys

I think we can close it, unless someone really wants to make a PR making this very explicit.

ml-evs avatar Jun 14 '23 09:06 ml-evs