frostdb icon indicating copy to clipboard operation
frostdb copied to clipboard

#88 execute queries in parallel

Open albertlockett opened this issue 2 years ago • 3 comments

https://github.com/polarsignals/arcticdb/issues/88#issuecomment-1148997165

This PR adds parallelism in the table iterator. After the rowgroups are collected, a number of goroutines are spawned to process the rowgroups in parallel by calling the passed iterator (e.g. the physical plan).

It might be the case that the iterator keeps internal state in a way that is not thread-safe (e.g. physical plan's HashAggregate). In this case, it makes sense for the iterator to be created per thread (i.e. one instance of physical plan per thread). To accommodate this, the interface to table iterator has changed to accept an interface IteratorProvider with a method Iterator() that generates the iterator.

For aggregation queries, the aggregations happen in parallel and a Finisher has been added to combine the results of the multiple aggregations.

It's also possible that the final query callback could now be called concurrently. A new parameter is added to the query builder called Options which has a field to control this behaviour.

query.NewEngine(memory.DefaultAllocator, db.TableProvider()).
  ScanTable("test_table").
  // NEW:
  Options(&query.Options{SyncCallback: false}) // default is 'true'
  Execute(context.Background(), func(ar arrow.Record) error {
    // controls if this function can execute is synchronized or concurrent

albertlockett avatar Jun 07 '22 17:06 albertlockett

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 09 '22 20:06 CLAassistant

hey @metalmatze , following up from your comments on #88. I have this as draft because I had a few outstanding work items:

  1. The current code can call the query callback in parallel, which could lead to unexpected behaviour if the callback isn't written to be threadsafe. I was going to try to add some synchronization around that
  2. fixing the issues mentioned in code review :)

I'll get those cleaned up asap and move this out of draft status

albertlockett avatar Jun 16 '22 14:06 albertlockett

@brancz the changes discussed in the comments above have been made. Would you mind reviewing again?

albertlockett avatar Jul 18 '22 01:07 albertlockett

Closing this since https://github.com/polarsignals/frostdb/pull/202 has merged. Thank you @albertlockett for your contributions! All your initial work helped to inform the final design!

thorfour avatar Oct 05 '22 13:10 thorfour

Very cool thanks nice work! @thorfour @metalmatze

albertlockett avatar Oct 05 '22 14:10 albertlockett