datasette icon indicating copy to clipboard operation
datasette copied to clipboard

Add ?_extra= mechanism for requesting extra properties in JSON

Open simonw opened this issue 6 years ago • 27 comments

Datasette views currently work by creating a set of data that should be returned as JSON, then defining an additional, optional template_data() function which is called if the view is being rendered as HTML.

This template_data() function calculates extra template context variables which are necessary for the HTML view but should not be included in the JSON.

Example of how that is used today: https://github.com/simonw/datasette/blob/2b79f2bdeb1efa86e0756e741292d625f91cb93d/datasette/views/table.py#L672-L704

With features like Facets in #255 I'm beginning to want to move more items into the template_data() - in the case of facets it's the suggested_facets array. This saves that feature from being calculated (involving several SQL queries) for the JSON case where it is unlikely to be used.

But... as an API user, I want to still optionally be able to access that information.

Solution: Add a ?_extra=suggested_facets&_extra=table_metadata argument which can be used to optionally request additional blocks to be added to the JSON API.

Then redefine as many of the current template_data() features as extra arguments instead, and teach Datasette to return certain extras by default when rendering templates.

This could allow the JSON representation to be slimmed down further (removing e.g. the table_definition and view_definition keys) while still making that information available to API users who need it.

simonw avatar May 16 '18 14:05 simonw

Idea: ?_extra=sqllog could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.

simonw avatar May 17 '18 00:05 simonw

Idea: ?_extra=sqllog could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.

I built a version of that a while ago as the ?_trace=1 argument.

simonw avatar Sep 12 '20 18:09 simonw

Are there any interesting use-cases for a plugin hook that allows plugins to define their own ?_extra= blocks?

simonw avatar Sep 12 '20 18:09 simonw

Just realized I added an undocumented ?_extras= option to the row view years ago and forgot about it - it's not even documented. Added in a30c5b220c15360d575e94b0e67f3255e120b916 - https://latest.datasette.io/fixtures/attraction_characteristic/2.json?_extras=foreign_key_tables

That will need to be made consistent with the new mechanism. I think ?_extra=a&_extra=b is more consistent with other Datasette features (like ?_facet=col1&_facet=col2) but potentially quite verbose.

So I could support ?_extra=a,b,c as an alternative allowed syntax, or I could allow ?_extra=single and ?_extras=comma,separated.

I think I prefer allowing commas in ?_extra=.

simonw avatar Oct 20 '20 16:10 simonw

In the documentation for ?_extra= I think I'll emphasize the comma-separated version of it. Also: there will be ?_extra= values which act as aliases for collection combinations - e.g. ?_extra=full will toggle everything.

simonw avatar Oct 20 '20 22:10 simonw

I think it's worth having a plugin hook for this - it can be same hook that is used internally. Maybe register_extra - it lets you return one or more extra implementations, each with a name and an async function that gets called.

Things like suggested facets will become register_extra hooks. Maybe actual facets too?

simonw avatar Oct 20 '20 22:10 simonw

This is now blocking https://github.com/simonw/datasette-graphql/issues/61 because that issue needs a way to turn off suggested facets when retrieving the results of a table query.

simonw avatar Oct 20 '20 23:10 simonw

I think I should prioritize the facets component of this, since that could have significant performance wins while also supporting datasette-graphql.

simonw avatar Oct 21 '20 00:10 simonw

This is relevant to the big refactor in:

  • #1518

simonw avatar Dec 15 '21 18:12 simonw

I spotted in https://github.com/simonw/datasette/issues/1719#issuecomment-1108888494 that there's actually already an undocumented implementation of ?_extras=foreign_key_tables - https://latest.datasette.io/fixtures/simple_primary_key/1.json?_extras=foreign_key_tables

I added that feature all the way back in November 2017! https://github.com/simonw/datasette/commit/a30c5b220c15360d575e94b0e67f3255e120b916

simonw avatar Apr 25 '22 18:04 simonw

As suggested in this issue:

  • #1721

There are three parts of the Datasette API that need to support extras:

  • Table, e.g. https://latest.datasette.io/fixtures/facetable.json
  • Row, e.g. https://latest.datasette.io/fixtures/facetable/1.json
  • Query, e.g. https://latest.datasette.io/fixtures/neighborhood_search.json or https://latest.datasette.io/fixtures.json?sql=%0Aselect+_neighborhood%2C+facet_cities.name%2C+state%0Afrom+facetable%0A++++join+facet_cities%0A++++++++on+facetable._city_id+%3D+facet_cities.id%0Awhere+_neighborhood+like+%27%25%27+||+%3Atext+||+%27%25%27%0Aorder+by+_neighborhood%3B%0A&text=

There are two other pages I should consider though:

  • https://latest.datasette.io/.json - the JSON version of the https://latest.datasette.io/ homepage
  • https://latest.datasette.io/fixtures.json - note that this is different from the same URL with ?sql=... appended to it. This is the index of tables in a specific database

simonw avatar Jan 17 '23 01:01 simonw

I'm not actually too happy about how /fixtures.json currently entirely changes shape based on whether or not you pass a ?sql= argument to it.

Maybe I can fix that disparity with extras too?

The list of tables you see on /fixtures.json without the ?sql= could become another extra. The HTML version of that page could know to request that extra by default.

This would also support running a SQL query but also returning a list of tables - which can be useful for building a SQL editor interface which hints at the tables that are available to the user - or even for generating the configuration needed by the CodeMirror editor's SQL completion, added in:

  • #1893

simonw avatar Jan 17 '23 02:01 simonw

I'm tempted NOT to document the JSON for the /.json page, simply because I'm not at all convinced that the current homepage design is the best possible use of that space - and I'd like to reserve the opportunity to redesign that in e.g. Datasette 1.1 without it being a breaking change to the documented JSON API.

simonw avatar Jan 17 '23 02:01 simonw

Thinking about ?_extra= values just for the table JSON. The default shape will look like this:

{
  "ok": true,
  "rows": [{"id": 1, "name": "Name"}],
  "next": null,
}

The table extras could be:

  • count - adds a "count" field with a full count(*) for that filtered table
  • next_url - the full URL to the next page
  • columns - adds "columns": ["id", "name"]
  • expandable_columns - a list of columns that can be expanded (note that "expanded_columns": [...] shows up automatically if the user passes any ?_label= options, like on https://latest.datasette.io/fixtures/facetable.json?_label=_city_id ) - I'm tempted to rename this to label_columns and have it add both label_columns and label_columns_selected or similar.
  • primary_keys - a list of primary keys e.g. ["id"] - not sure what to do about rowid columns here
  • query - a {"sql": "select ...", "params": {"p0": "1"}} object
  • units - the units feature
  • suggested_facets - suggested facets
  • metadata - a {"metadata": {"source_url": "..."}} etc block - differs from current in that it would be nested in "metadata": {...}.

Stuff currently in https://latest.datasette.io/fixtures/facetable.json that is not yet covered by the above:

  "database": "fixtures",
  "table": "facetable",
  "is_view": false,
  "human_description_en": "where id = 1",
  "private": false,
  "allow_execute_sql": true,
  "query_ms": 16.749476999393664,

I'm tempted to bundle database, table, is_view and human_description_en into one (not sure what to call it though, perhaps display_details?) - and then drop allow_execute_sql entirely and have private and query_ms as their own named extras.

simonw avatar Jan 17 '23 02:01 simonw

Or maybe have a permissions extra which includes allow_execute_sql and private? Could anything else go in there?

simonw avatar Jan 17 '23 17:01 simonw

In most cases, the ?_extra=xxx name exactly corresponds to the additional key that is added to the JSON.

?_facet=... is one example of a query string argument that causes an extra key - "facet_results" - to be added to the JSON even though it wasn't requested by name in a ?_extra=.

Am I OK with that? I think so.

Related issue:

  • #1558

Actually there's an edge-case here that's worth considering: it's possible to use metadata to set default facets for a table. If you do this for a table, then .json for that table will always calculate and return those facets - which may be an expensive and unnecessary operation.

So maybe we don't include facet_results in the JSON unless explicitly asked for in that case, but have a rule that ?_facet implies ?_extra=facet_results.

simonw avatar Jan 17 '23 17:01 simonw

I'm going to write code which parses ?_extra= in the comma separated or multiple parameter format and then looks up functions in a dictionary.

It will return an error if you ask for an extra that does not exist.

simonw avatar Jan 20 '23 05:01 simonw

Got first prototype working using asyncinject and it's pretty nice:

diff --git a/datasette/views/table.py b/datasette/views/table.py
index ad45ecd3..c8690b22 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -2,6 +2,7 @@ import asyncio
 import itertools
 import json
 
+from asyncinject import Registry
 import markupsafe
 
 from datasette.plugins import pm
@@ -538,57 +539,60 @@ class TableView(DataView):
         # Execute the main query!
         results = await db.execute(sql, params, truncate=True, **extra_args)
 
-        # Calculate the total count for this query
-        count = None
-        if (
-            not db.is_mutable
-            and self.ds.inspect_data
-            and count_sql == f"select count(*) from {table_name} "
-        ):
-            # We can use a previously cached table row count
-            try:
-                count = self.ds.inspect_data[database_name]["tables"][table_name][
-                    "count"
-                ]
-            except KeyError:
-                pass
-
-        # Otherwise run a select count(*) ...
-        if count_sql and count is None and not nocount:
-            try:
-                count_rows = list(await db.execute(count_sql, from_sql_params))
-                count = count_rows[0][0]
-            except QueryInterrupted:
-                pass
-
-        # Faceting
-        if not self.ds.setting("allow_facet") and any(
-            arg.startswith("_facet") for arg in request.args
-        ):
-            raise BadRequest("_facet= is not allowed")
+        # Resolve extras
+        extras = _get_extras(request)
+        if request.args.getlist("_facet"):
+            extras.add("facet_results")
 
-        # pylint: disable=no-member
-        facet_classes = list(
-            itertools.chain.from_iterable(pm.hook.register_facet_classes())
-        )
-        facet_results = {}
-        facets_timed_out = []
-        facet_instances = []
-        for klass in facet_classes:
-            facet_instances.append(
-                klass(
-                    self.ds,
-                    request,
-                    database_name,
-                    sql=sql_no_order_no_limit,
-                    params=params,
-                    table=table_name,
-                    metadata=table_metadata,
-                    row_count=count,
-                )
+        async def extra_count():
+            # Calculate the total count for this query
+            count = None
+            if (
+                not db.is_mutable
+                and self.ds.inspect_data
+                and count_sql == f"select count(*) from {table_name} "
+            ):
+                # We can use a previously cached table row count
+                try:
+                    count = self.ds.inspect_data[database_name]["tables"][table_name][
+                        "count"
+                    ]
+                except KeyError:
+                    pass
+
+            # Otherwise run a select count(*) ...
+            if count_sql and count is None and not nocount:
+                try:
+                    count_rows = list(await db.execute(count_sql, from_sql_params))
+                    count = count_rows[0][0]
+                except QueryInterrupted:
+                    pass
+            return count
+
+        async def facet_instances(extra_count):
+            facet_instances = []
+            facet_classes = list(
+                itertools.chain.from_iterable(pm.hook.register_facet_classes())
             )
+            for facet_class in facet_classes:
+                facet_instances.append(
+                    facet_class(
+                        self.ds,
+                        request,
+                        database_name,
+                        sql=sql_no_order_no_limit,
+                        params=params,
+                        table=table_name,
+                        metadata=table_metadata,
+                        row_count=extra_count,
+                    )
+                )
+            return facet_instances
+
+        async def extra_facet_results(facet_instances):
+            facet_results = {}
+            facets_timed_out = []
 
-        async def execute_facets():
             if not nofacet:
                 # Run them in parallel
                 facet_awaitables = [facet.facet_results() for facet in facet_instances]
@@ -607,9 +611,13 @@ class TableView(DataView):
                         facet_results[key] = facet_info
                     facets_timed_out.extend(instance_facets_timed_out)
 
-        suggested_facets = []
+            return {
+                "results": facet_results,
+                "timed_out": facets_timed_out,
+            }
 
-        async def execute_suggested_facets():
+        async def extra_suggested_facets(facet_instances):
+            suggested_facets = []
             # Calculate suggested facets
             if (
                 self.ds.setting("suggest_facets")
@@ -624,8 +632,15 @@ class TableView(DataView):
                 ]
                 for suggest_result in await gather(*facet_suggest_awaitables):
                     suggested_facets.extend(suggest_result)
+            return suggested_facets
+
+        # Faceting
+        if not self.ds.setting("allow_facet") and any(
+            arg.startswith("_facet") for arg in request.args
+        ):
+            raise BadRequest("_facet= is not allowed")
 
-        await gather(execute_facets(), execute_suggested_facets())
+        # pylint: disable=no-member
 
         # Figure out columns and rows for the query
         columns = [r[0] for r in results.description]
@@ -732,17 +747,56 @@ class TableView(DataView):
             rows = rows[:page_size]
 
         # human_description_en combines filters AND search, if provided
-        human_description_en = filters.human_description_en(
-            extra=extra_human_descriptions
-        )
+        async def extra_human_description_en():
+            human_description_en = filters.human_description_en(
+                extra=extra_human_descriptions
+            )
+            if sort or sort_desc:
+                human_description_en = " ".join(
+                    [b for b in [human_description_en, sorted_by] if b]
+                )
+            return human_description_en
 
         if sort or sort_desc:
             sorted_by = "sorted by {}{}".format(
                 (sort or sort_desc), " descending" if sort_desc else ""
             )
-            human_description_en = " ".join(
-                [b for b in [human_description_en, sorted_by] if b]
-            )
+
+        async def extra_next_url():
+            return next_url
+
+        async def extra_columns():
+            return columns
+
+        async def extra_primary_keys():
+            return pks
+
+        registry = Registry(
+            extra_count,
+            extra_facet_results,
+            extra_suggested_facets,
+            facet_instances,
+            extra_human_description_en,
+            extra_next_url,
+            extra_columns,
+            extra_primary_keys,
+        )
+
+        results = await registry.resolve_multi(
+            ["extra_{}".format(extra) for extra in extras]
+        )
+        data = {
+            "ok": True,
+            "rows": rows[:page_size],
+            "next": next_value and str(next_value) or None,
+        }
+        data.update({
+            key.replace("extra_", ""): value
+            for key, value in results.items()
+            if key.startswith("extra_")
+            and key.replace("extra_", "") in extras
+        })
+        return Response.json(data, default=repr)
 
         async def extra_template():
             nonlocal sort
@@ -1334,3 +1388,11 @@ class TableDropView(BaseView):
 
         await db.execute_write_fn(drop_table)
         return Response.json({"ok": True}, status=200)
+
+
+def _get_extras(request):
+    extra_bits = request.args.getlist("_extra")
+    extras = set()
+    for bit in extra_bits:
+        extras.update(bit.split(","))
+    return extras

With that in place, http://127.0.0.1:8001/content/releases?author=25778&_size=1&_extra=count,primary_keys,columns&_facet=author returns:

{
  "ok": true,
  "rows": [
    {
      "html_url": "https://github.com/eyeseast/geocode-sqlite/releases/tag/0.1.2",
      "id": 30926270,
      "author": {
        "value": 25778,
        "label": "eyeseast"
      },
      "node_id": "MDc6UmVsZWFzZTMwOTI2Mjcw",
      "tag_name": "0.1.2",
      "target_commitish": "master",
      "name": "v0.1.2",
      "draft": 0,
      "prerelease": 1,
      "created_at": "2020-09-08T17:48:24Z",
      "published_at": "2020-09-08T17:50:15Z",
      "body": "Basic API is in place, with CLI support for Google, Bing, MapQuest and Nominatum (OSM) geocoders.",
      "repo": {
        "value": 293361514,
        "label": "geocode-sqlite"
      },
      "reactions": null,
      "mentions_count": null
    }
  ],
  "next": "30926270",
  "primary_keys": [
    "id"
  ],
  "columns": [
    "html_url",
    "id",
    "author",
    "node_id",
    "tag_name",
    "target_commitish",
    "name",
    "draft",
    "prerelease",
    "created_at",
    "published_at",
    "body",
    "repo",
    "reactions",
    "mentions_count"
  ],
  "count": 25,
  "facet_results": {
    "results": {
      "author": {
        "name": "author",
        "type": "column",
        "hideable": true,
        "toggle_url": "/content/releases?author=25778&_size=1&_extra=count%2Cprimary_keys%2Ccolumns",
        "results": [
          {
            "value": 25778,
            "label": "eyeseast",
            "count": 25,
            "toggle_url": "http://127.0.0.1:8001/content/releases?_size=1&_extra=count%2Cprimary_keys%2Ccolumns&_facet=author",
            "selected": true
          }
        ],
        "truncated": false
      }
    },
    "timed_out": []
  }
}

simonw avatar Jan 21 '23 01:01 simonw

Implementing this to work with the .json extension is going to be a lot harder.

The challenge here is that we're working with the whole BaseView() v.s. TableView() abstraction, which I've been wanting to get rid of for a long time.

BaseView() calls .data() and expects to get back a (data, extra_template_data, templates) tuple - then if a format is in play (.json or .geojson or similar from a plugin) it hands off data to that. If .csv is involved it does something special, in order to support streaming responses. And if it's regular HTML it calls await extra_template_data() and combines that with data and passes it to the template.

I want this to work completely differently: I want the formats (including HTML) to have the option of adding some extra ?_extra= extras, then I want HTML to be able to render the page entirely from the JSON if necessary.

simonw avatar Jan 21 '23 04:01 simonw

I pushed my prototype so far, going to start a draft PR for it.

simonw avatar Jan 21 '23 04:01 simonw

It's annoying that the https://docs.datasette.io/en/0.64.1/plugin_hooks.html#register-output-renderer-datasette plugin hook passes rows as "list of sqlite3.Row objects" - I'd prefer it if that plugin hook worked with JSON data, not sqlite3.Row.

https://docs.datasette.io/en/0.64.1/plugin_hooks.html#render-cell-row-value-column-table-database-datasette is documented as accepting Row but actually gets CustomRow, see:

  • #1973

simonw avatar Jan 21 '23 05:01 simonw

Maybe "rows" should be a default ?_extra=... but it should be possible to request "arrays" instead which would be a list of arrays, more suitable perhaps for custom renderers such as the CSV one.

This could be quite neat, in that EVERY key in the JSON representation would be defined as an extra - just some would be on by default. There could even be a mechanism for turning them back off again, maybe using ?_extra=-rows.

In which case maybe ?_extra= isn't actually the right name for this feature. It could be ?_key= perhaps, or ?_field=.

Being able to pass ?_field=count,-rows to get back just the count (and skip executing the count entirely) would be pretty neat.

Although ?_only=count would be tidier. So maybe the pair of ?_only= and ?_extra= would make sense.

Would ?_only=rows still return the "ok" field so you can always look at that to confirm an error didn't occur?

simonw avatar Jan 21 '23 05:01 simonw

This issue here would benefit from some kid of mechanism for returning just the HTML of the table itself, without any of the surrounding material. I'm not sure if that would make sense as an extra or not:

  • https://github.com/simonw/datasette-search-all/issues/17

simonw avatar Jan 25 '23 21:01 simonw

I think that does make sense: ?_extra=table perhaps, which would add {"table": "..."}.

simonw avatar Feb 05 '23 22:02 simonw

Just realized that it's useful to be able to tell what parameters were used to generate a page... but reflecting things like _next back in the JSON is confusing in the presence of next.

So I'm going to add an extra for that information too.

Not sure what to call it though:

  • params - confusing because in the code that's usually used for params passed to SQL queries
  • query_string - wouldn't that be a string, not params as a dictionary?

I'm going to experiment with a request extra that returns some bits of information about the request.

simonw avatar Feb 08 '23 18:02 simonw

I just landed this PR so this feature is now in main:

  • #1999

Still needs documentation and maybe some extra tests too.

simonw avatar Mar 22 '23 22:03 simonw

I need to get the arbitrary query page to return the same format. It likely won't have nearly as many extras.

simonw avatar Mar 29 '23 06:03 simonw