datasette
datasette copied to clipboard
Streaming CSV spends a lot of time in `table_column_details`
At least I think it does. I tried running py-spy top -p $PID
against a Datasette process that was trying to do:
datasette covid.db --get '/covid/ny_times_us_counties.csv?_size=10&_stream=on'
While investigating:
- #1355
And spotted this:
datasette covid.db --get /covid/ny_times_us_counties.csv?_size=10&_stream=on' (python v3.10.2)
Total Samples 5800
GIL: 71.00%, Active: 98.00%, Threads: 4
%Own %Total OwnTime TotalTime Function (filename:line)
8.00% 8.00% 4.32s 4.38s sql_operation_in_thread (datasette/database.py:212)
5.00% 5.00% 3.77s 3.93s table_column_details (datasette/utils/__init__.py:614)
6.00% 6.00% 3.72s 3.72s _worker (concurrent/futures/thread.py:81)
7.00% 7.00% 2.98s 2.98s _read_from_self (asyncio/selector_events.py:120)
5.00% 6.00% 2.35s 2.49s detect_fts (datasette/utils/__init__.py:571)
4.00% 4.00% 1.34s 1.34s _write_to_self (asyncio/selector_events.py:140)
Relevant code: https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/init.py#L609-L625
Maybe it's because supports_table_xinfo()
creates a brand new in-memory SQLite connection every time you call it?
https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/sqlite.py#L22-L35
Actually no, I'm caching that already:
https://github.com/simonw/datasette/blob/798f075ef9b98819fdb564f9f79c78975a0f71e8/datasette/utils/sqlite.py#L12-L19