sqlite icon indicating copy to clipboard operation
sqlite copied to clipboard

Virtual table implementations are slow

Open anacrolix opened this issue 2 years ago • 6 comments

I have a virtual table implementation that generates 40 rows on average. With 12 million inputs it gets very slow. I think I've tracked the issue down to calling from C to Go being very slow. 60% of the CPU time is spent scheduling a goroutine to handle the incoming call.

One possible solution is to have an optimized version of a vtable cursor that doesn't need to call into Go for column, next, rowid, eof and maybe close (but not filter). I think filter could return a fully populated set of rows and columns and C code could walk that. This would reduce the C to Go calls by 5 per row, so about 200 calls per filter for my use case.

anacrolix avatar Oct 27 '22 11:10 anacrolix

I like the idea of a buffered cursor! It could be useful in cases where you anyways perform a full-fetch in the Filter() function, then use Next() + Column() + Eof() functions to increment a counter and read the value at the current counter.

That being said, I'd still like it to be an opt-in rather than the default behavior, as the current behavior is simple and generic enough to support this and other use-cases too.

Also, for the numbers you've shared, can you provide a reproducible example and / or a cpu- and memory- profile?

riyaz-ali avatar Oct 27 '22 16:10 riyaz-ali

Yeah it would definitely be opt-in. I suspect an alternative interface to VirtualCursor, possibly an alternate Open method.

anacrolix avatar Oct 27 '22 22:10 anacrolix

pprof002

anacrolix avatar Oct 27 '22 23:10 anacrolix

Trying the idea out in https://github.com/anacrolix/riyaz-ali-sqlite/compare/master...anacrolix:riyaz-ali-sqlite:array-filter?expand=1, (I had put my C callbacks in their own file, or I got duplicate errors, or no symbol if I made them static). It's about 3x faster for a test using a count(*). That's to be expected, as for every filter, I lose a next and eof call to Go. If columns were used, it would speed up by exactly how many extra column calls were made. Unfortunately it's still very slow, just the cost of hitting filter is very high. The sys cost dropped a lot tho, 39s down to 3.8s, not sure why. Probably less allocation. Here's the changed CPU pprof.

pprof003

anacrolix avatar Oct 28 '22 03:10 anacrolix

My estimates were out. The speed up is much larger (an intermediate function was causing a significant CPU cost in my tests that is unrelated to the vtable). For every filter, I was having about 40 eof and 40 next calls. Removing those reduced my runtime from ~2m30s to ~50s, where in both cases about ~35s were overhead elsewhere. So approximately an 8x speed up. Opting in to allocating everything up front is fine, see the branch. Switching to Rust however and the overhead is neglible. The speedup there is off the charts. It's definitely worth noting for anyone else trying this that the scheduler cost from C to Go is very very high.

anacrolix avatar Nov 02 '22 04:11 anacrolix

It's also worth noting that virtual tables are just slow in general in sqlite. I'm not sure why, but it's quite noticeable and avoiding them entirely (even if they're implemented in Rust, so presumably C too), and doing the equivalent behaviour in your application directly if possible.

anacrolix avatar Nov 17 '22 00:11 anacrolix