janusgraph
                                
                                 janusgraph copied to clipboard
                                
                                    janusgraph copied to clipboard
                            
                            
                            
                        Flaky Test: JanusGraphIndexTest.testDisableAndDiscardManuallyAndDropEnabledIndex
Flaky Test
- Test Name: JanusGraphIndexTest.testDisableAndDiscardManuallyAndDropEnabledIndex
- Link: https://github.com/JanusGraph/janusgraph/blob/2c7007653a74ba01b938c7803a17cf17b1e3a9fe/janusgraph-backend-testutils/src/main/java/org/janusgraph/graphdb/JanusGraphIndexTest.java#L3771
- Branch: master, PR: #3650
- Notes: This failed for the job [tests (es, -Pjava-11, -Pelasticsearch8, es8, 11)
Stack Trace
Error:  Failures: 
Error:    BerkeleyElasticsearchTest>JanusGraphIndexTest.testDisableAndDiscardManuallyAndDropEnabledIndex:3788->JanusGraphIndexTest.registerIndex:3945 expected: <true> but was: <false>
cc @rngcntr as you've just added this test in #3362.
registerIndex uses a timeout of 10 seconds to wait for the index operation to succeed. Given that the test executed successfully until now, I suspect that for some reason, the operation timed out in this particular run.
Increasing the timeout could solve it for now but that is hard to evaluate, since a simple re-run did apparently solve the failure as well. To me, it definitely makes sense to employ at least some kind of timeout in order to prevent tests from becoming stuck and needlessly consuming resources.
I think I already had to restart a workflow run at least once before because of one of these index tests, but I'm not 100% sure. If the timeout is just there to ensure that the tests don't become stuck in case something is really broken, then I think that we should increase the timeout a lot. CI runs are really unreliable in my experience so we probably have to expect that a simple operation can take longer than 10 seconds there. If the timeout really only kicks in if the implementation is broken, then it also doesn't hurt to increase it to something like 1 or 2 minutes, right?
It's definitely a big pain point that we have to restart CI jobs manually all the time because of such flaky tests so we should try our best to get them to work in a reliable way also under poor conditions like a CI system with minimal resources.
This is unfortunately still flaky:
Error:  org.janusgraph.diskstorage.es.BerkeleyElasticsearchTest.testDisableAndDiscardManuallyAndDropEnabledIndex  Time elapsed: 61.182 s  <<< FAILURE!
org.opentest4j.AssertionFailedError: expected: <true> but was: <false>
	at org.janusgraph.graphdb.JanusGraphIndexTest.registerIndex(JanusGraphIndexTest.java:3943)
	at org.janusgraph.graphdb.JanusGraphIndexTest.testDisableAndDiscardManuallyAndDropEnabledIndex(JanusGraphIndexTest.java:3786)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
	at java.util.ArrayList.forEach(ArrayList.java:1259)
Error:  Failures: 
Error:    BerkeleyElasticsearchTest>JanusGraphIndexTest.testDisableAndDiscardManuallyAndDropEnabledIndex:3786->JanusGraphIndexTest.registerIndex:3943 expected: <true> but was: <false>
Last seen in PR #3945 targeting master in the job tests (es, -Pelasticsearch8, es8, 8).