greptimedb icon indicating copy to clipboard operation
greptimedb copied to clipboard

fix: `test_dist_table_scan` block

Open MichaelScofield opened this issue 2 years ago • 1 comments

I hereby agree to the terms of the GreptimeDB CLA

What's changed and what's your intention?

test_dist_table_scan could be blocking for multiple reasons:

  1. the tasks that were producing the RecordBatch result items ignored the sending error
    • test_dist_table_scan uses CoalescePartitionsExec to merge partitions data
    • CoalescePartitionsExec will spawn multiple tasks that each of which gathers one partition's RecordBatch result to a final output stream, through a mpsc channel
    • the task ignores the sending error, thus even if the receiver has decided to quit, the sender(the task) don't care (fixed in this PR: https://github.com/apache/arrow-datafusion/pull/3276/files)
    • the task thus keeps polling the stream, in a while let Some loop
  2. the stream being polled always returns Some if the stream is failed to init
    • never return None means we have an infinite loop in the above task
    • casuing tokio runtime busy, other tasks like heartbeats are not likely to proceed (hence the many error backtrace logs)

The fix is simple, upgrade to newer version of DataFusion(done), and make the stream able to emit None.

Checklist

  • [ ] I have written the necessary rustdoc comments.
  • [x] I have added the necessary unit tests and integration tests.

Refer to a related PR or issue link (optional)

MichaelScofield avatar Dec 16 '22 10:12 MichaelScofield

Codecov Report

Merging #761 (61e1f2e) into develop (66bca11) will decrease coverage by 0.07%. The diff coverage is 97.05%.

@@             Coverage Diff             @@
##           develop     #761      +/-   ##
===========================================
- Coverage    85.36%   85.28%   -0.08%     
===========================================
  Files          414      414              
  Lines        54167    54388     +221     
===========================================
+ Hits         46240    46387     +147     
- Misses        7927     8001      +74     
Flag Coverage Δ
rust 85.28% <97.05%> (-0.08%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/common/recordbatch/src/adapter.rs 92.25% <97.02%> (+15.86%) :arrow_up:
src/common/recordbatch/src/error.rs 64.28% <100.00%> (ø)
src/object-store/src/test_util.rs 0.00% <0.00%> (-100.00%) :arrow_down:
tests-integration/src/test_util.rs 87.98% <0.00%> (-11.54%) :arrow_down:
src/datanode/src/instance.rs 32.22% <0.00%> (-11.12%) :arrow_down:
src/datanode/src/sql/insert.rs 80.43% <0.00%> (-9.79%) :arrow_down:
src/servers/src/grpc.rs 91.17% <0.00%> (-2.95%) :arrow_down:
src/query/src/datafusion/error.rs 92.00% <0.00%> (-1.25%) :arrow_down:
src/meta-srv/src/service/heartbeat.rs 86.31% <0.00%> (-1.06%) :arrow_down:
src/frontend/src/instance/distributed.rs 79.92% <0.00%> (-0.90%) :arrow_down:
... and 17 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Dec 16 '22 10:12 codecov[bot]

@waynexia all comments resolved, PTAL

MichaelScofield avatar Dec 17 '22 09:12 MichaelScofield