datasette Streaming CSV spends a lot of time in `table_column

Streaming CSV spends a lot of time in `table_column_details`

Open simonw opened this issue 2 years ago • 1 comments

At least I think it does. I tried running py-spy top -p $PID against a Datasette process that was trying to do:

datasette covid.db --get '/covid/ny_times_us_counties.csv?_size=10&_stream=on'

While investigating:

#1355

And spotted this:

datasette covid.db --get /covid/ny_times_us_counties.csv?_size=10&_stream=on' (python v3.10.2)
Total Samples 5800
GIL: 71.00%, Active: 98.00%, Threads: 4

  %Own   %Total  OwnTime  TotalTime  Function (filename:line)                                                                                                                                            
  8.00%   8.00%    4.32s     4.38s   sql_operation_in_thread (datasette/database.py:212)
  5.00%   5.00%    3.77s     3.93s   table_column_details (datasette/utils/__init__.py:614)
  6.00%   6.00%    3.72s     3.72s   _worker (concurrent/futures/thread.py:81)
  7.00%   7.00%    2.98s     2.98s   _read_from_self (asyncio/selector_events.py:120)
  5.00%   6.00%    2.35s     2.49s   detect_fts (datasette/utils/__init__.py:571)
  4.00%   4.00%    1.34s     1.34s   _write_to_self (asyncio/selector_events.py:140)

Relevant code: https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/init.py#L609-L625

Mar 20 '22 22:03 simonw

Maybe it's because supports_table_xinfo() creates a brand new in-memory SQLite connection every time you call it?

https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/sqlite.py#L22-L35

Actually no, I'm caching that already:

https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/sqlite.py#L12-L19

Mar 20 '22 22:03 simonw

datasette datasette copied to clipboard

Streaming CSV spends a lot of time in `table_column_details`

datasette
datasette copied to clipboard