refactor(storage): use dynamic trait for storage interfaces
Signed-off-by: Runji Wang [email protected]
Dynamic async traits are not so bad for performance, as long as they are not called frequently. The advantage is that they are less invasive as they do not require generic parameters everywhere, and they allow for extensible plugins in the future.
This PR uses #[async_trait] to refactor storage traits into dyn traits. It removes <S: Storage> from executors. The only performance critical part is the TxnIterator::next_batch which is called frequently in scanning. We refactor it into a BoxStream to eliminate boxing the future on each call.
Performance comparison on TPCH
| tpch | time |
|---|---|
| q1 | +5.9% |
| q17 | +5.6% |
| q18 | +5.4% |
| q13 | +4.1% |
| q11 | +3.9% |
| q14 | +3.9% |
| q7 | +3.7% |
| q4 | +3.2% |
| q3 | +3.1% |
| q10 | +2.2% |
| q9 | +1.8% |
| q5 | +1.4% |
| q15 | +1.2% |
| q16 | +1.2% |
| q19 | +1.2% |
| q6 | +0.4% |
| q12 | -0.4% |
| q20 | -0.6% |
| q22 | -2.5% |
| q2 | -2.9% |
| q8 | -2.9% |
| q21 | -5.4% |
Weakly -1
The current dynamic traits have too many limitations, making them seem like unfinished products.
And pass generic parameters are not really very painful.
also objected by @mrcroxx
also objected by @MrCroxx
The #[async_trait] allocates a boxed future on each call. The overhead can be high for memory-only path. e.g. cache hits.
Given that the storage part of the system runs in batches (i.e., we always fetch as many as 1024 rows if possible), I don't think using a dynamic trait would cause a large regression. However, I don't see the code is simplified significantly without static dispatch, except that the S: Storage thing gets removed. Therefore, I'm kind of in the middle of approving or turning down this pull request :(