pisa
pisa copied to clipboard
Support QPS/TPS reporting
Describe the solution you'd like
The bitfunnel paper contained some additions to the partitioned_elias_fano
codebase which allowed measuring queries per second with multiple threads. see here: https://github.com/BitFunnel/partitioned_elias_fano/blob/master/Runner/QueryLogRunner.cpp#L56
It would be nice in addition to reporting the metrics by queries
if pisa could provide such a multi threaded QPS benchmark tool.
Additional context
In practice one would be interested in metrics such as "if my p95 requirement is 100ms, how much QPS can I achieve with this index and this hardware configuration"
@amallia @elshize @mpetri
I am thinking about this at the moment. Do we want to try to use TBB to facilitate the multi threading, or should we try to follow the example from the BitFunnel experiments (they seem to roll their own thread synchronization so they avoid timing the overhead of spooling up threads prior to processing). I suspect we can make a simple one-function addition to the queries
binary to allow threads.
I know that evaluate_queries
does parallel processing but it uses the tbb::parallel_for
approach which might add some overhead to the timings? Not entirely sure...
@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?
If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?
If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:
- begin to end,
- each execution separately,
- report data for each query: time of arrival, time finished (so also latency)
I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.
@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?
Right, I mean starting each thread.
If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?
I agree, this is the goal.
If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:
* begin to end, * each execution separately, * report data for each query: time of arrival, time finished (so also latency)
I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.
Okay, I agree and I am in favor of your suggested approach. I think we have capacity to do this via TBB right?
Yes, I believe TBB should do just fine. There are two concurrent queues: one unbounded and one bounded. Since currently all queries are loaded to memory at the beginning anyway, we don't need to bound the queue.