Investigate JSONPath as an alternative to JMESPath
I learned about JSONPath when I noticed that Kubernetes uses it. It looks like an alternative to JMESPath. I am not sure if it is as formally specified or what other tradeoffs there might be, but its use by Kubernetes provides some basis to believe it will stick around. It would be worth understanding how they compare.
Interesting arguments that JMESPath is the more robust choice: https://github.com/OAI/OpenAPI-Specification/discussions/2556#discussioncomment-678802
I think JMESPath is the clear winner. Closing, but open to arguments if anyone wants to revive this in the future.
Postgres implements JSONPath, and it looks like SQLite followed suit. If we want predicate push-down, that is the language we will need.
However, the points raised in the linked post above stand. JMESPath is much better specified. The question is whether to expose JMESPath in the HTTP API and convert or to expose JSONPath.
As of September it looks like JSONPath is on track to be formally specified. https://datatracker.ietf.org/wg/jsonpath/about/
The RFC is dormant, but there is nonetheless an actively maintained Python implementation of the proposal.
https://pypi.org/project/jsonpath-ng/
For use cases where we want to fish just a couple fields (e.g. some columns in a table of search results) out of some sprawling JSON metadata, we can get large performance wins from predicate push-down.
SQLite supports the basics: https://www.sqlite.org/json1.html#jex
And Postgres supports quite a lot, perhaps more than we would need to expose, at least at first: https://www.postgresql.org/docs/current/functions-json.html#FUNCTIONS-SQLJSON-PATH
Latest thinking on this, per chat with @pbeaucage:
- Deprecate
select_metadataand our support JMESPath, as JMESPath does too much: too complicated, and maybe exposes too much power to clients to put load on the server - Add
selectto/searchand/metadata, accepting JSONPath. (Adding it to/searchis the important one, since the savings on a whole _page of results is the most significant.) The parameter should be typedList[str], as in?select=start.uid&select=start.scan_id. The returnedmetadatashould always be a flat JSON object, keyed like{"start.uid": ..., "start.scan_id": ...}.