pinot icon indicating copy to clipboard operation
pinot copied to clipboard

OfflineClusterIntegrationTest.testInvertedIndexTriggering is flaky

Open richardstartin opened this issue 3 years ago • 15 comments

https://github.com/apache/pinot/runs/5240101919?check_suite_focus=true

richardstartin avatar Feb 18 '22 00:02 richardstartin

@klsince Please take a look at this test and see if it still applies

Jackie-Jiang avatar May 10 '22 23:05 Jackie-Jiang

will look

klsince avatar May 11 '22 00:05 klsince

not able to reproduce failure on my local; and didn't spot this test case being flaky recently. but lemme know if you've noticed any recent failure on this test case.

klsince avatar May 11 '22 23:05 klsince

sampled a few recently failed PR builds, and didn't find this test failed the build. I'm closing this for now.

klsince avatar May 16 '22 17:05 klsince

I encountered this today!

richardstartin avatar May 16 '22 17:05 richardstartin

I think this is due to the iteration order of the hash map. @richardstartin Do you run into this locally or in Github Actions? Can you post the link here if there is one?

Jackie-Jiang avatar May 16 '22 22:05 Jackie-Jiang

https://github.com/apache/pinot/runs/6453127269?check_suite_focus=true#step:5:30195

richardstartin avatar May 17 '22 08:05 richardstartin

tried this: https://github.com/apache/pinot/pull/8739

klsince avatar May 19 '22 20:05 klsince

Got another failure after the fix: https://github.com/apache/pinot/runs/6660284410?check_suite_focus=true

Jackie-Jiang avatar May 31 '22 00:05 Jackie-Jiang

looking via this debug PR: https://github.com/apache/pinot/pull/8807

klsince avatar May 31 '22 16:05 klsince

the PR got built twice and saw no failure. maybe land it ^ for now to print more info to debug the failure when it happens again

klsince avatar May 31 '22 19:05 klsince

Still Flaky. https://github.com/apache/pinot/runs/6754124273?check_suite_focus=true

KKcorps avatar Jun 07 '22 06:06 KKcorps

Some quick findings:

from the error msg, (will add segment name in the error msg)

java.lang.AssertionError: Table size: 20900902 should increase after adding inverted index, as compared with 20937551 expected [true] but found [false]

But as on my local, the table size is expected be 20949859.

The gap (48957) between 20949859 and 20900902 was like missing a metadata.properties file. Some segments have metadata.properties as 48956 or 48957.

ls -l ... mytable_OFFLINE/mytable_16071_16101_3 %/v3:
total 3472
-rw-r--r--  1 xiaobing  staff  1706093 Jun  7 11:40 columns.psf
-rw-r--r--  1 xiaobing  staff       16 Jun  7 11:40 creation.meta
-rw-r--r--  1 xiaobing  staff    13278 Jun  7 11:40 index_map
-rw-r--r--  1 xiaobing  staff    48956 Jun  7 11:40 metadata.properties


ls -l .../mytable_OFFLINE/mytable_16405_16435_2 %/v3:
total 3336
-rw-r--r--  1 xiaobing  staff  1637322 Jun  7 11:40 columns.psf
-rw-r--r--  1 xiaobing  staff       16 Jun  7 11:39 creation.meta
-rw-r--r--  1 xiaobing  staff    13290 Jun  7 11:40 index_map
-rw-r--r--  1 xiaobing  staff    48957 Jun  7 11:40 metadata.properties

klsince avatar Jun 07 '22 18:06 klsince

I'll wait for my luck to see this failure again: https://github.com/apache/pinot/pull/8853

klsince avatar Jun 07 '22 18:06 klsince

Another failure: https://github.com/apache/pinot/actions/runs/3094335423/jobs/5007603397

Error:  Failures: 
Error:    OfflineClusterIntegrationTest.testInvertedIndexTriggering:530 Table size: 20801751 should increase after adding inverted index on segment: mytable_16071_16101_3 %, as compared with 20938679 expected [true] but found [false]

Jackie-Jiang avatar Sep 21 '22 05:09 Jackie-Jiang

Haven't run into this for 1 year. Close it

Jackie-Jiang avatar Nov 14 '23 22:11 Jackie-Jiang