expath-cg icon indicating copy to clipboard operation
expath-cg copied to clipboard

Custom Response Parsing

Open ChristianGruen opened this issue 6 years ago • 0 comments

Here is a summary on the discussion on custom response parsing (#108, #125, and others):

In version 1 of the HTTP Client Module, the override-media-type was available (inspired by the override-content-type option in XProc). It could be used to overwrite the Content-Type header of a response.

In practice, the approach turned out be fairly flexible, but it had some shortcomings: It did not allow for a fine-grained processing of multipart bodies, and it was not intuitive enough for all users.

The following alternatives have been proposed in the scope of version 2 of the spec:

parse-response (boolean)

  • Original draft: https://github.com/expath/expath-cg/blob/1da836628bbdf831fcfc1e4ad9dc487d05e7c663/specs/http-client-2/index.html
  • Description: Parsing of the response body can be disabled via the parse-response option. All bodies of single and multipart responses will be returned as binary items of type xs:base64Binary, and the values can be processed (stored, parsed, forwarded) in a second step.

Adam pointed out that the name may be misleading, so it’s named parse-response-entity-body in the current draft. My suggestion in #108 was to call it parse-bodies: Only responses are “parsed” (requests are serialized), and the plural form indicates that we may have multiple bodies in a response.

parse-response (enum)

Adam made a suggestion for extending the proposal in #108:

  • raw. We don't have an equivalent option at the moment, but the idea is that the raw response from the server is returned. i.e. no parsing occurs, no status, no headers. This has applications for debugging and also for logging responses.
  • status. This would be equivalent to status-only: true().
  • headers. This would be the equivalent to parse-response-entity-body : false()
  • multipart-raw. This would extract the headers of the response, and locate the multipart bodies, however this would present each multipart in a raw manner, i.e. no multipart headers would be parsed.
  • full. This would be the default, and basically the same as the current parse-response-entity-body : true()

parse-response (map)

In #125, the proposal was extended to a nested map (further discussion see https://github.com/expath/expath-cg/pull/125#issuecomment-430608433).

I decided to summarize the proposals as I believe that a plain and simple solution might lead to less confusion and may even be more flexible, because a user can always do post-processing in XQuery.

In my opinion, the major requirements for (non-implicit) response parsing is to be able to retrieve bodies (single part, multiple bodies) in their original representation. In #125, I proposed the following solution:

parse / parse-bodies (string)

Option Description
auto implicit parsing (default)
string return all bodies as strings
binary return all bodies as binaries
skip ignore response body

I believe this approach would be sufficient to cover most challenges people will be confronted with (but, honestly, not all that we could envision):

  • In most cases, people will use the default (auto).
  • If the requested result cannot be converted to the implicit target format, or if another format is required than resulting from the implicit conversion, the string option can be used for textual results. All bodies will be converted to strings, based on the encoding that is returned by the server (optionally) via the original Content-Type header and the charset option.
  • The binary option is helpful…
    • if the conversion is no text,
    • if the string conversion fails,
    • if some bodies of a multipart response are textual and some are binary, or
    • if the results needs to be processed only as simple stream.
  • The skip is option is used if only the headers of a result are required.

Some more thoughts on this simplified approach are listed in #125.

Examples for using the approach:

(: return single JSON response as XML :)
http:get('http://json.db/doc123', map { 'parse': 'string' })?body
=> fn:json-to-xml()
(: store returned multipart bodies :)
for $part at $pos in http:get('http://multipart.db/data123', map { 'parse': 'binary' })?body
return file:write-binary($pos || '.bin', $part?body)
(: ignore reponse bodies :)
http:get('http://json.db/doc123', map { 'parse': 'skip' })

@adamretter: Maybe my thoughts are too plain and simple? Do you get some more use cases in mind that we should consider? Looking forward to feedback!

ChristianGruen avatar Oct 22 '18 16:10 ChristianGruen