heroic
heroic copied to clipboard
Investigate & resolve nondeterministic build errors
- I am a heroic developer
- Who wants to have deterministic builds
- So that I can retain my sanity and be productive instead of chasing random build errors
Design & Implementation Notes
- See Heroic Build Issues Analysis gdoc
- See this Slack thread for details
- See this build for a full technical example
- be aware of network errors that CircleCI seem to have regularly (but not always)
- and other weird, random issues that CircleCI seems to throw up
- I think this is occurring irrespective of the branch too (but might only pertain to
feature/add-bigtable-timeout-settings-refactored) :
feature/add-bigtable-timeout-settings-refactored com.spotify.heroic.GrpcClusterQueryIT > distributedFilterQueryTest FAILED
java.lang.IllegalStateException: failed to create a child event loop
Caused by:
io.netty.channel.ChannelException: failed to open a new selector
Caused by:
java.io.IOException: Too many open files
java.lang.NullPointerException
Update
found that removing the 4x multiplier from this method :
@Provides
@GrpcRpcScope
@Named("worker")
fun worker() = NioEventLoopGroup(Runtime.getRuntime().availableProcessors() * 4)
stops the exception from being thrown. But there are still many questions unanswered:
- why only my and Sergey's machines
- why does commenting out a seemingly innocuous IT (
testbasicWithNoDistribution) also stops exception from being thrown - is the problem we're encountering pertinent to production operation of Heroic or is it a quirk of our Machines or just something that unit test code will exhibit
- what is the "correct" fix for this. Removing the 4x multiplier is a poor workaround at best.
(CC @malish8632)