datasette icon indicating copy to clipboard operation
datasette copied to clipboard

Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows

Open simonw opened this issue 5 years ago • 5 comments

This can drive the upgrade of the register_output_renderer hook to be able to handle streaming all rows in a large query.

simonw avatar Oct 29 '20 21:10 simonw

Relevant code: https://github.com/simonw/datasette/blob/d6f9ff71378c4eab34dad181c23cfc143a4aef2d/datasette/views/base.py#L258-L345

simonw avatar Oct 29 '20 21:10 simonw

Implementing this would make #1356 a whole lot more interesting.

simonw avatar Jun 03 '21 05:06 simonw

I can get regular .json to stream too, using the pattern described in this TIL: https://til.simonwillison.net/python/output-json-array-streaming

simonw avatar Mar 15 '22 18:03 simonw

for teaching register_output_renderer to stream it seems like the two options are to

  1. a nested query technique to paginate through
  2. a fetching model that looks like something
with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk

currently db.execute is not a generator, so this would probably need a new method?

fgregg avatar Sep 28 '22 12:09 fgregg

if you went this route:

with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk

then time_limit_ms would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.

i wonder if this was why you were thinking this feature would need a dedicated connection?


reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors).

maybe something like this would be okay?

with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     # step through at least one to evaluate the statement, not sure if this is necessary
     yield c.execute.fetchone()
for chunk in c.fetchmany(chunk_size):
    yield from chunk

this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.

fgregg avatar Sep 28 '22 13:09 fgregg