datasette icon indicating copy to clipboard operation
datasette copied to clipboard

"View all" option for facets, to provide a (paginated) list of ALL of the facet counts plus a link to view them

Open simonw opened this issue 5 years ago • 7 comments

Can use /database/-/... namespace from #296

simonw avatar Nov 02 '20 19:11 simonw

URL design:

/database/table/-/facet/colname

And for other types of facet (to be supported later):

/database/table/-/facet/colname?_type=m2m

simonw avatar Nov 02 '20 20:11 simonw

Implementing pagination for facets will be interesting. Would be easier if I had a nicer reusable internal pagination mechanism, which is also needed for #856 (pagination of canned queries).

simonw avatar Nov 02 '20 20:11 simonw

Maybe this ends up being code that defers to a simulated canned query, rendered using the existing query.html template.

simonw avatar Nov 02 '20 20:11 simonw

For regular column faceting, here's the query that is used:

https://github.com/simonw/datasette/blob/13d1228d80c91d382a05b1a9549ed02c300ef851/datasette/facets.py#L196-L204

Since it uses order by count desc, value maybe those values could be used to implement cursor-based pagination.

That wouldn't be robust in the face of changing data, but I'm not sure it's possible to implement paginated faceting in a way that survives ongoing changes to the underlying data.

simonw avatar Nov 02 '20 20:11 simonw

Also relevant to this issue: #830 - redesigning the facet plugin hook in preparation for Datasette 1.0. And #972 supporting faceting against arbitrary queries.

simonw avatar Nov 02 '20 20:11 simonw

Had a new, different idea for how this could work: support a ?_group_count=colname parameter to the table view, which turns the page into a select colname, count(*) ... group by colname query - but keeps things like the filter interface, facet selection, search box and so on.

simonw avatar Feb 04 '22 06:02 simonw

An initial prototype of that in my local group-count branch quickly started running into problems:

diff --git a/datasette/views/table.py b/datasette/views/table.py
index be9e9c3..d30efe1 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -105,8 +105,12 @@ class RowTableShared(DataView):
                 type_ = "integer"
                 notnull = 0
             else:
-                type_ = column_details[r[0]].type
-                notnull = column_details[r[0]].notnull
+                try:
+                    type_ = column_details[r[0]].type
+                    notnull = column_details[r[0]].notnull
+                except KeyError: # Probably count(*)
+                    type_ = "integer"
+                    notnull = False
             columns.append(
                 {
                     "name": r[0],
@@ -613,6 +617,15 @@ class TableView(RowTableShared):
             offset=offset,
         )
 
+        # If ?_group_count we convert the SQL query here
+        group_count = request.args.getlist("_group_count")
+        if group_count:
+            wrapped_sql = "select {cols}, count(*) from ({sql}) group by {cols}".format(
+                cols=", ".join(group_count),
+                sql=sql,
+            )
+            sql = wrapped_sql
+
         if request.args.get("_timelimit"):
             extra_args["custom_time_limit"] = int(request.args.get("_timelimit"))

Resulted in errors like this one:

    pk_path = path_from_row_pks(row, pks, not pks, False)
  File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 82, in path_from_row_pks
    bits = [
  File "/Users/simon/Dropbox/Development/datasette/datasette/utils/__init__.py", line 83, in <listcomp>
    row[pk]["value"] if isinstance(row[pk], dict) else row[pk] for pk in pks
IndexError: No item with that key

simonw avatar Feb 04 '22 06:02 simonw