datasette
datasette copied to clipboard
Add ?_extra= mechanism for requesting extra properties in JSON
Datasette views currently work by creating a set of data that should be returned as JSON, then defining an additional, optional template_data()
function which is called if the view is being rendered as HTML.
This template_data()
function calculates extra template context variables which are necessary for the HTML view but should not be included in the JSON.
Example of how that is used today: https://github.com/simonw/datasette/blob/2b79f2bdeb1efa86e0756e741292d625f91cb93d/datasette/views/table.py#L672-L704
With features like Facets in #255 I'm beginning to want to move more items into the template_data()
- in the case of facets it's the suggested_facets
array. This saves that feature from being calculated (involving several SQL queries) for the JSON case where it is unlikely to be used.
But... as an API user, I want to still optionally be able to access that information.
Solution: Add a ?_extra=suggested_facets&_extra=table_metadata
argument which can be used to optionally request additional blocks to be added to the JSON API.
Then redefine as many of the current template_data()
features as extra arguments instead, and teach Datasette to return certain extras by default when rendering templates.
This could allow the JSON representation to be slimmed down further (removing e.g. the table_definition
and view_definition
keys) while still making that information available to API users who need it.
Idea: ?_extra=sqllog
could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.
Idea:
?_extra=sqllog
could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.
I built a version of that a while ago as the ?_trace=1
argument.
Are there any interesting use-cases for a plugin hook that allows plugins to define their own ?_extra=
blocks?
Just realized I added an undocumented ?_extras=
option to the row view years ago and forgot about it - it's not even documented. Added in a30c5b220c15360d575e94b0e67f3255e120b916 - https://latest.datasette.io/fixtures/attraction_characteristic/2.json?_extras=foreign_key_tables
That will need to be made consistent with the new mechanism. I think ?_extra=a&_extra=b
is more consistent with other Datasette features (like ?_facet=col1&_facet=col2
) but potentially quite verbose.
So I could support ?_extra=a,b,c
as an alternative allowed syntax, or I could allow ?_extra=single
and ?_extras=comma,separated
.
I think I prefer allowing commas in ?_extra=
.
In the documentation for ?_extra=
I think I'll emphasize the comma-separated version of it. Also: there will be ?_extra=
values which act as aliases for collection combinations - e.g. ?_extra=full
will toggle everything.
I think it's worth having a plugin hook for this - it can be same hook that is used internally. Maybe register_extra
- it lets you return one or more extra
implementations, each with a name and an async function that gets called.
Things like suggested facets will become register_extra
hooks. Maybe actual facets too?
This is now blocking https://github.com/simonw/datasette-graphql/issues/61 because that issue needs a way to turn off suggested facets when retrieving the results of a table query.
I think I should prioritize the facets component of this, since that could have significant performance wins while also supporting datasette-graphql
.
This is relevant to the big refactor in:
- #1518
I spotted in https://github.com/simonw/datasette/issues/1719#issuecomment-1108888494 that there's actually already an undocumented implementation of ?_extras=foreign_key_tables
- https://latest.datasette.io/fixtures/simple_primary_key/1.json?_extras=foreign_key_tables
I added that feature all the way back in November 2017! https://github.com/simonw/datasette/commit/a30c5b220c15360d575e94b0e67f3255e120b916
As suggested in this issue:
- #1721
There are three parts of the Datasette API that need to support extras:
- Table, e.g. https://latest.datasette.io/fixtures/facetable.json
- Row, e.g. https://latest.datasette.io/fixtures/facetable/1.json
- Query, e.g. https://latest.datasette.io/fixtures/neighborhood_search.json or https://latest.datasette.io/fixtures.json?sql=%0Aselect+_neighborhood%2C+facet_cities.name%2C+state%0Afrom+facetable%0A++++join+facet_cities%0A++++++++on+facetable._city_id+%3D+facet_cities.id%0Awhere+_neighborhood+like+%27%25%27+||+%3Atext+||+%27%25%27%0Aorder+by+_neighborhood%3B%0A&text=
There are two other pages I should consider though:
- https://latest.datasette.io/.json - the JSON version of the https://latest.datasette.io/ homepage
- https://latest.datasette.io/fixtures.json - note that this is different from the same URL with
?sql=...
appended to it. This is the index of tables in a specific database
I'm not actually too happy about how /fixtures.json
currently entirely changes shape based on whether or not you pass a ?sql=
argument to it.
Maybe I can fix that disparity with extras too?
The list of tables you see on /fixtures.json
without the ?sql=
could become another extra. The HTML version of that page could know to request that extra by default.
This would also support running a SQL query but also returning a list of tables - which can be useful for building a SQL editor interface which hints at the tables that are available to the user - or even for generating the configuration needed by the CodeMirror editor's SQL completion, added in:
- #1893
I'm tempted NOT to document the JSON for the /.json
page, simply because I'm not at all convinced that the current homepage design is the best possible use of that space - and I'd like to reserve the opportunity to redesign that in e.g. Datasette 1.1 without it being a breaking change to the documented JSON API.
Thinking about ?_extra=
values just for the table JSON. The default shape will look like this:
{
"ok": true,
"rows": [{"id": 1, "name": "Name"}],
"next": null,
}
The table extras could be:
-
count
- adds a"count"
field with a fullcount(*)
for that filtered table -
next_url
- the full URL to the next page -
columns
- adds"columns": ["id", "name"]
-
expandable_columns
- a list of columns that can be expanded (note that"expanded_columns": [...]
shows up automatically if the user passes any?_label=
options, like on https://latest.datasette.io/fixtures/facetable.json?_label=_city_id ) - I'm tempted to rename this tolabel_columns
and have it add bothlabel_columns
andlabel_columns_selected
or similar. -
primary_keys
- a list of primary keys e.g.["id"]
- not sure what to do aboutrowid
columns here -
query
- a{"sql": "select ...", "params": {"p0": "1"}}
object -
units
- the units feature -
suggested_facets
- suggested facets -
metadata
- a{"metadata": {"source_url": "..."}}
etc block - differs from current in that it would be nested in"metadata": {...}
.
Stuff currently in https://latest.datasette.io/fixtures/facetable.json that is not yet covered by the above:
"database": "fixtures",
"table": "facetable",
"is_view": false,
"human_description_en": "where id = 1",
"private": false,
"allow_execute_sql": true,
"query_ms": 16.749476999393664,
I'm tempted to bundle database
, table
, is_view
and human_description_en
into one (not sure what to call it though, perhaps display_details
?) - and then drop allow_execute_sql
entirely and have private
and query_ms
as their own named extras.
Or maybe have a permissions
extra which includes allow_execute_sql
and private
? Could anything else go in there?
In most cases, the ?_extra=xxx
name exactly corresponds to the additional key that is added to the JSON.
?_facet=...
is one example of a query string argument that causes an extra key - "facet_results"
- to be added to the JSON even though it wasn't requested by name in a ?_extra=
.
Am I OK with that? I think so.
Related issue:
- #1558
Actually there's an edge-case here that's worth considering: it's possible to use metadata to set default facets for a table. If you do this for a table, then .json
for that table will always calculate and return those facets - which may be an expensive and unnecessary operation.
So maybe we don't include facet_results
in the JSON unless explicitly asked for in that case, but have a rule that ?_facet
implies ?_extra=facet_results
.
I'm going to write code which parses ?_extra=
in the comma separated or multiple parameter format and then looks up functions in a dictionary.
It will return an error if you ask for an extra that does not exist.
Got first prototype working using asyncinject
and it's pretty nice:
diff --git a/datasette/views/table.py b/datasette/views/table.py
index ad45ecd3..c8690b22 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -2,6 +2,7 @@ import asyncio
import itertools
import json
+from asyncinject import Registry
import markupsafe
from datasette.plugins import pm
@@ -538,57 +539,60 @@ class TableView(DataView):
# Execute the main query!
results = await db.execute(sql, params, truncate=True, **extra_args)
- # Calculate the total count for this query
- count = None
- if (
- not db.is_mutable
- and self.ds.inspect_data
- and count_sql == f"select count(*) from {table_name} "
- ):
- # We can use a previously cached table row count
- try:
- count = self.ds.inspect_data[database_name]["tables"][table_name][
- "count"
- ]
- except KeyError:
- pass
-
- # Otherwise run a select count(*) ...
- if count_sql and count is None and not nocount:
- try:
- count_rows = list(await db.execute(count_sql, from_sql_params))
- count = count_rows[0][0]
- except QueryInterrupted:
- pass
-
- # Faceting
- if not self.ds.setting("allow_facet") and any(
- arg.startswith("_facet") for arg in request.args
- ):
- raise BadRequest("_facet= is not allowed")
+ # Resolve extras
+ extras = _get_extras(request)
+ if request.args.getlist("_facet"):
+ extras.add("facet_results")
- # pylint: disable=no-member
- facet_classes = list(
- itertools.chain.from_iterable(pm.hook.register_facet_classes())
- )
- facet_results = {}
- facets_timed_out = []
- facet_instances = []
- for klass in facet_classes:
- facet_instances.append(
- klass(
- self.ds,
- request,
- database_name,
- sql=sql_no_order_no_limit,
- params=params,
- table=table_name,
- metadata=table_metadata,
- row_count=count,
- )
+ async def extra_count():
+ # Calculate the total count for this query
+ count = None
+ if (
+ not db.is_mutable
+ and self.ds.inspect_data
+ and count_sql == f"select count(*) from {table_name} "
+ ):
+ # We can use a previously cached table row count
+ try:
+ count = self.ds.inspect_data[database_name]["tables"][table_name][
+ "count"
+ ]
+ except KeyError:
+ pass
+
+ # Otherwise run a select count(*) ...
+ if count_sql and count is None and not nocount:
+ try:
+ count_rows = list(await db.execute(count_sql, from_sql_params))
+ count = count_rows[0][0]
+ except QueryInterrupted:
+ pass
+ return count
+
+ async def facet_instances(extra_count):
+ facet_instances = []
+ facet_classes = list(
+ itertools.chain.from_iterable(pm.hook.register_facet_classes())
)
+ for facet_class in facet_classes:
+ facet_instances.append(
+ facet_class(
+ self.ds,
+ request,
+ database_name,
+ sql=sql_no_order_no_limit,
+ params=params,
+ table=table_name,
+ metadata=table_metadata,
+ row_count=extra_count,
+ )
+ )
+ return facet_instances
+
+ async def extra_facet_results(facet_instances):
+ facet_results = {}
+ facets_timed_out = []
- async def execute_facets():
if not nofacet:
# Run them in parallel
facet_awaitables = [facet.facet_results() for facet in facet_instances]
@@ -607,9 +611,13 @@ class TableView(DataView):
facet_results[key] = facet_info
facets_timed_out.extend(instance_facets_timed_out)
- suggested_facets = []
+ return {
+ "results": facet_results,
+ "timed_out": facets_timed_out,
+ }
- async def execute_suggested_facets():
+ async def extra_suggested_facets(facet_instances):
+ suggested_facets = []
# Calculate suggested facets
if (
self.ds.setting("suggest_facets")
@@ -624,8 +632,15 @@ class TableView(DataView):
]
for suggest_result in await gather(*facet_suggest_awaitables):
suggested_facets.extend(suggest_result)
+ return suggested_facets
+
+ # Faceting
+ if not self.ds.setting("allow_facet") and any(
+ arg.startswith("_facet") for arg in request.args
+ ):
+ raise BadRequest("_facet= is not allowed")
- await gather(execute_facets(), execute_suggested_facets())
+ # pylint: disable=no-member
# Figure out columns and rows for the query
columns = [r[0] for r in results.description]
@@ -732,17 +747,56 @@ class TableView(DataView):
rows = rows[:page_size]
# human_description_en combines filters AND search, if provided
- human_description_en = filters.human_description_en(
- extra=extra_human_descriptions
- )
+ async def extra_human_description_en():
+ human_description_en = filters.human_description_en(
+ extra=extra_human_descriptions
+ )
+ if sort or sort_desc:
+ human_description_en = " ".join(
+ [b for b in [human_description_en, sorted_by] if b]
+ )
+ return human_description_en
if sort or sort_desc:
sorted_by = "sorted by {}{}".format(
(sort or sort_desc), " descending" if sort_desc else ""
)
- human_description_en = " ".join(
- [b for b in [human_description_en, sorted_by] if b]
- )
+
+ async def extra_next_url():
+ return next_url
+
+ async def extra_columns():
+ return columns
+
+ async def extra_primary_keys():
+ return pks
+
+ registry = Registry(
+ extra_count,
+ extra_facet_results,
+ extra_suggested_facets,
+ facet_instances,
+ extra_human_description_en,
+ extra_next_url,
+ extra_columns,
+ extra_primary_keys,
+ )
+
+ results = await registry.resolve_multi(
+ ["extra_{}".format(extra) for extra in extras]
+ )
+ data = {
+ "ok": True,
+ "rows": rows[:page_size],
+ "next": next_value and str(next_value) or None,
+ }
+ data.update({
+ key.replace("extra_", ""): value
+ for key, value in results.items()
+ if key.startswith("extra_")
+ and key.replace("extra_", "") in extras
+ })
+ return Response.json(data, default=repr)
async def extra_template():
nonlocal sort
@@ -1334,3 +1388,11 @@ class TableDropView(BaseView):
await db.execute_write_fn(drop_table)
return Response.json({"ok": True}, status=200)
+
+
+def _get_extras(request):
+ extra_bits = request.args.getlist("_extra")
+ extras = set()
+ for bit in extra_bits:
+ extras.update(bit.split(","))
+ return extras
With that in place, http://127.0.0.1:8001/content/releases?author=25778&_size=1&_extra=count,primary_keys,columns&_facet=author
returns:
{
"ok": true,
"rows": [
{
"html_url": "https://github.com/eyeseast/geocode-sqlite/releases/tag/0.1.2",
"id": 30926270,
"author": {
"value": 25778,
"label": "eyeseast"
},
"node_id": "MDc6UmVsZWFzZTMwOTI2Mjcw",
"tag_name": "0.1.2",
"target_commitish": "master",
"name": "v0.1.2",
"draft": 0,
"prerelease": 1,
"created_at": "2020-09-08T17:48:24Z",
"published_at": "2020-09-08T17:50:15Z",
"body": "Basic API is in place, with CLI support for Google, Bing, MapQuest and Nominatum (OSM) geocoders.",
"repo": {
"value": 293361514,
"label": "geocode-sqlite"
},
"reactions": null,
"mentions_count": null
}
],
"next": "30926270",
"primary_keys": [
"id"
],
"columns": [
"html_url",
"id",
"author",
"node_id",
"tag_name",
"target_commitish",
"name",
"draft",
"prerelease",
"created_at",
"published_at",
"body",
"repo",
"reactions",
"mentions_count"
],
"count": 25,
"facet_results": {
"results": {
"author": {
"name": "author",
"type": "column",
"hideable": true,
"toggle_url": "/content/releases?author=25778&_size=1&_extra=count%2Cprimary_keys%2Ccolumns",
"results": [
{
"value": 25778,
"label": "eyeseast",
"count": 25,
"toggle_url": "http://127.0.0.1:8001/content/releases?_size=1&_extra=count%2Cprimary_keys%2Ccolumns&_facet=author",
"selected": true
}
],
"truncated": false
}
},
"timed_out": []
}
}
Implementing this to work with the .json
extension is going to be a lot harder.
The challenge here is that we're working with the whole BaseView()
v.s. TableView()
abstraction, which I've been wanting to get rid of for a long time.
BaseView()
calls .data()
and expects to get back a (data, extra_template_data, templates)
tuple - then if a format is in play (.json
or .geojson
or similar from a plugin) it hands off data
to that. If .csv
is involved it does something special, in order to support streaming responses. And if it's regular HTML it calls await extra_template_data()
and combines that with data
and passes it to the template.
I want this to work completely differently: I want the formats (including HTML) to have the option of adding some extra ?_extra=
extras, then I want HTML to be able to render the page entirely from the JSON if necessary.
I pushed my prototype so far, going to start a draft PR for it.
It's annoying that the https://docs.datasette.io/en/0.64.1/plugin_hooks.html#register-output-renderer-datasette plugin hook passes rows
as "list of sqlite3.Row objects" - I'd prefer it if that plugin hook worked with JSON data, not sqlite3.Row
.
https://docs.datasette.io/en/0.64.1/plugin_hooks.html#render-cell-row-value-column-table-database-datasette is documented as accepting Row
but actually gets CustomRow
, see:
- #1973
Maybe "rows"
should be a default ?_extra=
... but it should be possible to request "arrays"
instead which would be a list of arrays, more suitable perhaps for custom renderers such as the CSV one.
This could be quite neat, in that EVERY key in the JSON representation would be defined as an extra - just some would be on by default. There could even be a mechanism for turning them back off again, maybe using ?_extra=-rows
.
In which case maybe ?_extra=
isn't actually the right name for this feature. It could be ?_key=
perhaps, or ?_field=
.
Being able to pass ?_field=count,-rows
to get back just the count (and skip executing the count entirely) would be pretty neat.
Although ?_only=count
would be tidier. So maybe the pair of ?_only=
and ?_extra=
would make sense.
Would ?_only=rows
still return the "ok"
field so you can always look at that to confirm an error didn't occur?
This issue here would benefit from some kid of mechanism for returning just the HTML of the table itself, without any of the surrounding material. I'm not sure if that would make sense as an extra or not:
- https://github.com/simonw/datasette-search-all/issues/17
I think that does make sense: ?_extra=table
perhaps, which would add {"table": "..."}
.
Just realized that it's useful to be able to tell what parameters were used to generate a page... but reflecting things like _next
back in the JSON is confusing in the presence of next
.
So I'm going to add an extra for that information too.
Not sure what to call it though:
-
params
- confusing because in the code that's usually used for params passed to SQL queries -
query_string
- wouldn't that be a string, not params as a dictionary?
I'm going to experiment with a request
extra that returns some bits of information about the request.
I just landed this PR so this feature is now in main
:
- #1999
Still needs documentation and maybe some extra tests too.
I need to get the arbitrary query page to return the same format. It likely won't have nearly as many extras.