datasette Refactor .csv to be an output renderer - and teach register_output

This can drive the upgrade of the register_output_renderer hook to be able to handle streaming all rows in a large query.

Oct 29 '20 21:10 simonw

Relevant code: https://github.com/simonw/datasette/blob/d6f9ff71378c4eab34dad181c23cfc143a4aef2d/datasette/views/base.py#L258-L345

Oct 29 '20 21:10 simonw

Implementing this would make #1356 a whole lot more interesting.

Jun 03 '21 05:06 simonw

I can get regular .json to stream too, using the pattern described in this TIL: https://til.simonwillison.net/python/output-json-array-streaming

Mar 15 '22 18:03 simonw

for teaching register_output_renderer to stream it seems like the two options are to

a nested query technique to paginate through
a fetching model that looks like something

with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk

currently db.execute is not a generator, so this would probably need a new method?

Sep 28 '22 12:09 fgregg

if you went this route:

with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     for chunk in c.fetchmany(chunk_size):
         yield from chunk

then time_limit_ms would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.

i wonder if this was why you were thinking this feature would need a dedicated connection?

reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors).

maybe something like this would be okay?

with sqlite_timelimit(conn, time_limit_ms):
     c.execute(query)
     # step through at least one to evaluate the statement, not sure if this is necessary
     yield c.execute.fetchone()
for chunk in c.fetchmany(chunk_size):
    yield from chunk

this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.

Sep 28 '22 13:09 fgregg

Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows