datasette
datasette copied to clipboard
Complete refactor of TableView and table.html template
Split from #878. The current TableView
class is by far the most complex part of Datasette, and the most difficult to work on: https://github.com/simonw/datasette/blob/0.59.2/datasette/views/table.py
In #878 I started exploring a new pattern for building views. In doing so it became clear that TableView
is the first beast that I need to slay - if I can refactor that into something neat the pattern for building other views will emerge as a natural consequence.
I've been trying to build this as a register_routes()
plugin, as originally suggested in #870 - though unfortunately it looks like those plugins can't replace existing Datasette default views at the moment, see #1517. [UPDATE: I was wrong about this, plugins can over-ride default views just fine]
I also know that I want to have a fully documented template context for table.html
as a major step on the way to Datasette 1.0, see #1510.
All of this adds up to the TableView
factor being a major project that will unblock a whole flurry of other things - so I'm going to work on that in this separate issue.
Here's where I got to with my hacked-together initial plugin prototype - it managed to render the table page with some rows on it (and a bunch of missing functionality such as filters): https://gist.github.com/simonw/281eac9c73b062c3469607ad86470eb2

Ideally I'd like to execute the existing test suite against the new implementation - that would require me to solve this so I can replace the view with the plugin version though:
- #1517
I was wrong about that, you CAN over-ride default routes already.
A (likely incomplete) list of features on the table page:
- [ ] Display table/database/instance metadata
- [ ] Show count of all results
- [ ] Display table of results
- [ ] Special table display treatment for URLs, numbers
- [ ] Allow plugins to modify table cells
- [ ] Respect
?_col=
and?_nocol=
- [ ] Show interface for filtering by columns and operations
- [ ] Show search box, support executing FTS searches
- [ ] Sort table by specified column
- [ ] Paginate table
- [ ] Show facet results
- [ ] Show suggested facets
- [ ] Link to available exports
- [ ] Display schema for table
- [ ] Maybe it should show the SQL for the query too?
- [ ] Handle various non-obvious querystring options, like
?_where=
and?_through=
My goal is to break up a lot of this functionality into separate methods. These methods can be executed in parallel by asyncinject
, but more importantly they can be used to build a much better JSON representation, where the default representation is lighter and ?_extra=x
options can be used to execute more expensive portions and add them to the response.
So the HTML version itself needs to be re-written to use those JSON extras.
Right now the HTML version gets to cheat - it passes through objects that are not JSON serializable, including custom functions that can then be called by Jinja.
I'm interested in maybe removing this cheating - if the HTML version could only request JSON-serializable extras those could be exposed in the API as well.
It would also help cleanup the kind-of-nasty pattern I use in the current BaseView
where everything returns both a bunch of JSON-serializable data AND an awaitable function that then gets to add extra things to the HTML context.
... and while I'm doing all of this I can rewrite the templates to not use those cheating magical functions AND document the template context at the same time, refs:
- #1510.
Very confused by this piece of code here: https://github.com/simonw/datasette/blob/1c13e1af0664a4dfb1e69714c56523279cae09e4/datasette/views/table.py#L37-L63
I added it in https://github.com/simonw/datasette/commit/754836eef043676e84626c4fd3cb993eed0d2976 - in the new world that should probably be replaced by pure JSON.
Aha - this comment explains it: https://github.com/simonw/datasette/issues/521#issuecomment-505279560
I think the trick is to redefine what a "cell_row" is. Each row is currently a list of cells:
https://github.com/simonw/datasette/blob/6341f8cbc7833022012804dea120b838ec1f6558/datasette/views/table.py#L159-L163
I can redefine the row (the
cells
variable in the above example) as a thing-that-iterates-cells (hence behaving like a list) but that also supports__getitem__
access for looking up cell values if you know the name of the column.
The goal was to support neater custom templates like this:
{% for row in display_rows %}
<h2 class="scientist">{{ row["First_Name"] }} {{ row["Last_Name"] }}</h2>
...
This may be an argument for continuing to allow non-JSON-objects through to the HTML templates. Need to think about that a bit more.
I'm going to try leaning into the asyncinject
mechanism a bit here. One method can execute and return the raw rows. Another can turn that into the default minimal JSON representation. Then a third can take that (or take both) and use it to inflate out the JSON that the HTML template needs, with those extras and with the rendered cells from plugins.
This may be an argument for continuing to allow non-JSON-objects through to the HTML templates. Need to think about that a bit more.
I can definitely support this using pure-JSON - I could make two versions of the row available, one that's an array of cell objects and the other that's an object mapping column names to column raw values.
Two new requirements inspired by work on the datasette-table
(and datasette-notebook
) projects:
- #1533
- #1534
I'm also going to use the new datasette-table
Web Component to help guide the design of the new API, which relates directly to this issue too:
- #1532
Aside: is there any reason this work can't complete the long-running goal of merging the TableView and QueryView, such that most of the features available for tables become available for arbitrary queries too?
I had already mentally committed to implementing facets for queries, but I just realized that filters could work too - using either a CTE or a nested query.
Pagination is the one holdout here, since table pagination uses keyset pagination over a known order. But maybe arbitrary queries can only be paginated off you order them first?
(I could experiment with merging the two tables by adding a temporary undocumented ?_sql=
parameter to the in-progress table view that sets an alternative query instead of select cols from table
- added bonus, this will force me to use introspection against the returned columns rather than mixing in the known columns for the specified table)
If I break this up into @inject
methods, what methods could I have and what would they do?
-
resolve_path
: Use request path to resolve the database and table. Could handle hash URLs too (if I don't manage to extract those to a plugin) - would be nice if this could raise a redirect, but I think that will instead have to be one of the things it returns -
build_sql
: Builds the SQL query based on the querystring (and some DB introspection) -
execute_count
: Execute thecount(*)
-
execute_rows
: Execute thelimit 101
to fetch the rows -
execute_facets
: Execute all requested facets (could this do its ownasyncio.gather()
to run facets in parallel?) -
suggest_facets
: Execute facet suggestions
Are there any plugin hooks that would make sense to execute in parallel? Actually there might be: I don't think extra_template_vars
, extra_css_urls
, extra_js_urls
, extra_body_script
depend on each other so it might be possible to execute them in a parallel chunk (at least any of them that return awaitables).
I have a hunch that the conclusion of this experiment may end up being that the asyncinject
trick is kinda neat but the code will be easier to maintain (while still executing in parallel) if it's written using asyncio.gather
directly instead.
It's possible asyncinject
will end up being neat enough that I'll want to keep it though.
Rebuilding TableView
from the ground up is proving not to be much fun. I'm going to explore starting the refactor of the existing code by separating out the bit that generates the SQL query from the rest of it.
The tests for TableView
are currently mixed in with everything else in tests/test_api.py
and tests/html.py
- might be good to split those out into test_table_html.py
and test_table_api.py
since they're such a key part of how Datasette works.
I don't think this code is necessary any more: https://github.com/simonw/datasette/blob/492f9835aa7e90540dd0c6324282b109f73df71b/datasette/views/table.py#L396-L399
That dates back from when Datasette was built on top of Sanic and Sanic didn't preserve those query parameters the way I needed it to:
https://github.com/simonw/datasette/blob/1f69269fe93e4cd42e56890126cc0dbcf719c6cb/datasette/views/table.py#L202-L206
No, removing that gave me the following test failure:
tests/test_table_api.py::test_table_filter_queries[/fixtures/simple_primary_key.json?content__exact=-expected_rows2] FAILED [100%]
=============================================================================== FAILURES ================================================================================
______________________________________ test_table_filter_queries[/fixtures/simple_primary_key.json?content__exact=-expected_rows2] ______________________________________
app_client = <datasette.utils.testing.TestClient object at 0x10d45d2d0>, path = '/fixtures/simple_primary_key.json?content__exact=', expected_rows = [['3', '']]
@pytest.mark.parametrize(
"path,expected_rows",
[
("/fixtures/simple_primary_key.json?content=hello", [["1", "hello"]]),
(
"/fixtures/simple_primary_key.json?content__contains=o",
[
["1", "hello"],
["2", "world"],
["4", "RENDER_CELL_DEMO"],
],
),
("/fixtures/simple_primary_key.json?content__exact=", [["3", ""]]),
(
"/fixtures/simple_primary_key.json?content__not=world",
[
["1", "hello"],
["3", ""],
["4", "RENDER_CELL_DEMO"],
["5", "RENDER_CELL_ASYNC"],
],
),
],
)
def test_table_filter_queries(app_client, path, expected_rows):
response = app_client.get(path)
> assert expected_rows == response.json["rows"]
E AssertionError: assert [['3', '']] == [['1', 'hello'],\n ['2', 'world'],\n ['3', ''],\n ['4', 'RENDER_CELL_DEMO'],\n ['5', 'RENDER_CELL_ASYNC']]
E At index 0 diff: ['3', ''] != ['1', 'hello']
E Right contains 4 more items, first extra item: ['2', 'world']
E Full diff:
E [
E - ['1',
E - 'hello'],
E - ['2',
E - 'world'],
E ['3',
E ''],
E - ['4',
E - 'RENDER_CELL_DEMO'],
E - ['5',
E - 'RENDER_CELL_ASYNC'],
E ]
/Users/simon/Dropbox/Development/datasette/tests/test_table_api.py:511: AssertionError
Idea: in JSON output include a warnings
block listing any _ parameters that were not recognized.
Built a new plugin to help with this work by improving the display of ?_trace=1
output: https://datasette.io/plugins/datasette-pretty-traces
Useful old comment here: https://github.com/simonw/datasette/issues/617#issuecomment-552253893
As noted in #621 (comment) a common pattern in this method is blocks of code that append new items to the
where_clauses
,params
andextra_human_descriptions
arrays. This is a useful refactoring opportunity.Code that fits this pattern:
- The code that builds based on the filters:
where_clauses, params = filters.build_where_clauses(table)
andhuman_description_en = filters.human_description_en(extra=extra_human_descriptions)
- Code that handles
?_where=
:where_clauses.extend(request.args["_where"])
- though note that this also appends to aextra_wheres_for_ui
array which nothing else uses- The
_through=
code, see Syntax for ?_through= that works as a form field #621 for details- The code that deals with
?_search=
FTSThe keyset pagination code modifies
where_clauses
andparams
too, but I don't think it's quite going to work with the same abstraction that would cover the above examples.
-
table_actions
should be an extra.
Maybe a better way to approach this would be to focus on the JSON side of things - try to get a basic JSON version with ?_extra=
support working, then eventually build that up to the point where it can power the HTML version.
There are actually four forms of SQL query used by the table page:
-
from_sql
- just thefrom table_name where ...
-
sql_no_order_no_limit
- used for faceting,"select {select_all_columns} from {table_name} {where}"
-
sql
- the above but with order and limit clauses:"select {select_specified_columns} from {table_name} {where}{order_by} limit {page_size}{offset}"
-
count_sql
used for the count, built out offrom_sql
:"select count(*) {from_sql}"
I'm tempted to encapsulate those in a Query
class.
Should facets really not be displayed on pages past page one (where ?_next=
is set)? That made sense to me at the time, but I'm now having second thoughts about it.
I guess it's a useful performance tweak for when crawlers keep hitting the ?_next=
link.
Actually it looks like facets DO display on subsequent pages, e.g. on https://global-power-plants.datasettes.com/global-power-plants/global-power-plants?_next=200 - but facet suggestions do not, thanks to this code: https://github.com/simonw/datasette/blob/2c07327d23d9c5cf939ada9ba4091c1b8b2ba42d/datasette/views/table.py#L777-L785
A fundamental operation of this view is to construct the SQL query and accompanying human description based on the incoming query string parameters.
The human description is the bit at the top of https://latest.datasette.io/fixtures/searchable?_search=dog&_sort=pk&_facet=text2&text2=sara+weasel that says:
1 row where search matches "dog" and text2 = "sara weasel" sorted by pk
(Also used in the page <title>
).
The code actually gathers three things:
- Fragments of the
where
clause, for example"text2" = :p0
- Parameters, e.g.
{"p0": "sara weasel"}
- Human description components, e.g.
text2 = "sara weasel"
Some operations such as ?_where=
don't currently provide an extra human description component.
_where=
also doesn't populate a parameter, but maybe it could? Would be neat if in the future ?_where=foo+=+:bar
worked and added a bar
input field to the screen, as seen with custom queries.
I added a ton of comments to the data()
method which really helps get a better feel for how this all works: https://github.com/simonw/datasette/blob/0663d5525cc41e9260ac7d1f6386d3a6eb5ad2a9/datasette/views/table.py#L322
Is there an opportunity to refactor things using a new plugin hook here? Maybe the register_filters
hook from #473, where the hook becomes responsible for building where clauses (and human descriptions of them) based on the incoming query string.
That version dealt with Filter
classes, but those might be a bit too low-level for this.
?_spatial_within=GEOJSON
was an interesting idea attached to that issue.