specs icon indicating copy to clipboard operation
specs copied to clipboard

Avoid arbitrary keys in query and result batches

Open osma opened this issue 5 years ago • 2 comments

Currently a query batch is a JSON object where the keys can be anything, e.g. (from the current spec, Example 3)

{
  "q1": {
    "query": "Hans-Eberhard Urbaniak"
  },
  "q2": {
    "query": "Ernst Schwanhold"
  }
}

The arbitrary key aspect makes it difficult to define a clear JSON Schema data model, which could be used within an OpenAPI spec to validate the data. It is possible to write a schema (the spec has one already), but this requires using a regex "^.*$" for the key part, which I think isn't very elegant and could potentially make it difficult to detect genuine problems. See this comment where the issue was mentioned.

I suggest changing the batch definition to a JSON list where each query has a mandatory qid field specifying the query ID, like this:

[
  { 
    "qid": "q1",
    "query": "Hans-Eberhard Urbaniak"
  },
  { 
    "qid": "q2",
    "query": "Ernst Schwanhold"
  }
]

Likewise, the result batch would switch from an object indexed using query IDs to a list of result objects where each result object has a qid field matching the query ID of the corresponding query.

osma avatar Jan 11 '20 18:01 osma

If we are to use an array, can't we just omit the query id altogether and just use the query order? The responses would have to be in the same order.

For instance, ElasticSearch's bulk API relies on the order of queries: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

By the way, ElasticSearch's API actually uses line-delimited JSON, not even an array, to submit multiple queries. I wonder if that is something we should consider - perhaps it makes it easier to submit queries in a streaming fashion. But I assume OpenAPI is not going to like that?

wetneb avatar Jan 11 '20 19:01 wetneb

Yes, using the order would work too. I'm OK with that as well - just want to get rid of the arbitrary keys.

Newline delimited JSON (known as NDJSON or JSON Lines) would be an option too, but as you say, it doesn't play well with OpenAPI. I think you can specify that a payload is in that format, but you cannot then do any validation on the content.

If we want true streaming (keep the connection open, submit a stream of queries, then receive responses, rinse and repeat...) then that would be possible too, I've seen some HTTP APIs that do that. But I suspect it's not worth the trouble for reconciliation.

osma avatar Jan 11 '20 20:01 osma