risinglight icon indicating copy to clipboard operation
risinglight copied to clipboard

refactor(storage): use dynamic trait for storage interfaces

Open wangrunji0408 opened this issue 1 year ago • 3 comments

Signed-off-by: Runji Wang [email protected]

Dynamic async traits are not so bad for performance, as long as they are not called frequently. The advantage is that they are less invasive as they do not require generic parameters everywhere, and they allow for extensible plugins in the future.

This PR uses #[async_trait] to refactor storage traits into dyn traits. It removes <S: Storage> from executors. The only performance critical part is the TxnIterator::next_batch which is called frequently in scanning. We refactor it into a BoxStream to eliminate boxing the future on each call.

Performance comparison on TPCH
tpch time
q1 +5.9%
q17 +5.6%
q18 +5.4%
q13 +4.1%
q11 +3.9%
q14 +3.9%
q7 +3.7%
q4 +3.2%
q3 +3.1%
q10 +2.2%
q9 +1.8%
q5 +1.4%
q15 +1.2%
q16 +1.2%
q19 +1.2%
q6 +0.4%
q12 -0.4%
q20 -0.6%
q22 -2.5%
q2 -2.9%
q8 -2.9%
q21 -5.4%

wangrunji0408 avatar Apr 16 '24 06:04 wangrunji0408

Weakly -1

The current dynamic traits have too many limitations, making them seem like unfinished products.

And pass generic parameters are not really very painful.

TennyZhuang avatar Apr 17 '24 07:04 TennyZhuang

also objected by @mrcroxx

wangrunji0408 avatar Apr 17 '24 09:04 wangrunji0408

also objected by @MrCroxx

The #[async_trait] allocates a boxed future on each call. The overhead can be high for memory-only path. e.g. cache hits.

MrCroxx avatar Apr 17 '24 14:04 MrCroxx

Given that the storage part of the system runs in batches (i.e., we always fetch as many as 1024 rows if possible), I don't think using a dynamic trait would cause a large regression. However, I don't see the code is simplified significantly without static dispatch, except that the S: Storage thing gets removed. Therefore, I'm kind of in the middle of approving or turning down this pull request :(

skyzh avatar Jun 16 '24 19:06 skyzh