Refactor .csv to be an output renderer - and teach register_output_renderer to stream all rows
This can drive the upgrade of the register_output_renderer hook to be able to handle streaming all rows in a large query.
Relevant code: https://github.com/simonw/datasette/blob/d6f9ff71378c4eab34dad181c23cfc143a4aef2d/datasette/views/base.py#L258-L345
Implementing this would make #1356 a whole lot more interesting.
I can get regular .json to stream too, using the pattern described in this TIL: https://til.simonwillison.net/python/output-json-array-streaming
for teaching register_output_renderer to stream it seems like the two options are to
- a nested query technique to paginate through
- a fetching model that looks like something
with sqlite_timelimit(conn, time_limit_ms):
c.execute(query)
for chunk in c.fetchmany(chunk_size):
yield from chunk
currently db.execute is not a generator, so this would probably need a new method?
if you went this route:
with sqlite_timelimit(conn, time_limit_ms):
c.execute(query)
for chunk in c.fetchmany(chunk_size):
yield from chunk
then time_limit_ms would probably have to be greatly extended, because the time spent in the loop will depend on the downstream processing.
i wonder if this was why you were thinking this feature would need a dedicated connection?
reading more, there's no real limit i can find on the number of active cursors (or more precisely active prepared statements objects, because sqlite doesn't really have cursors).
maybe something like this would be okay?
with sqlite_timelimit(conn, time_limit_ms):
c.execute(query)
# step through at least one to evaluate the statement, not sure if this is necessary
yield c.execute.fetchone()
for chunk in c.fetchmany(chunk_size):
yield from chunk
this seems quite weird that there's not more of limit of the number of active prepared statements, but i haven't been able to find one.