faust
faust copied to clipboard
Faust app hangs waiting for recovery
I am experiencing a problem that prevents my Faust app from starting properly. The issue occurs with Python 3.6 and Python 3.7 but only on my CI server. I cannot reproduce the issue on my working machine.
Any ideas how to get to the ground of this?
Checklist
- [x] I have included information about relevant versions
- [ ] I have verified that the issue persists when using the
master
branch of Faust.
Steps to reproduce
- Start a fresh Kafka instance (this is important. Subsequent runs will succeed.)
- Run the minimal example with the provided commands
Here is a minimal example:
import faust
app = faust.App('myapp', broker='kafka://localhost:9094', store='memory://')
topic1 = app.topic('topic1')
topic2 = app.topic('topic2') # Comment this line to make the app start without hanging
topic1_table = app.Table('topic1')
topic2_table = app.Table('topic2') # Comment this line to make the app start without hanging
@app.agent(topic1)
async def myagent(stream):
async for item in stream:
print(item)
Start it with faust --loglevel debug --debug --app api.app worker --without-web
.
Expected behavior
App starts normally.
Actual behavior
App hangs when starting.
Full traceback
[2018-11-21 12:16:42,195: WARNING]: [^Worker]: Cannot start console: aiomonitor is not installed
[2018-11-21 12:16:42,195: INFO]: [^Worker]: Starting...
[2018-11-21 12:16:42,196: INFO]: [^-App]: Starting...
[2018-11-21 12:16:42,196: INFO]: [^--Monitor]: Starting...
[2018-11-21 12:16:42,196: DEBUG]: [^--Monitor]: Started.
[2018-11-21 12:16:42,196: INFO]: [^--Producer]: Starting...
[2018-11-21 12:16:42,196: DEBUG]: Starting the Kafka producer
[2018-11-21 12:16:42,196: DEBUG]: Attempting to bootstrap via node at localhost:9094
[2018-11-21 12:16:42,205: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,205: DEBUG]: Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,206: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,206: DEBUG]: Received cluster metadata: ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,206: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,210: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: ApiVersionResponse_v0(error_code=0, api_versions=[(api_key=0, min_version=0, max_version=5), (api_key=1, min_version=0, max_version=6), (api_key=2, min_version=0, max_version=2), (api_key=3, min_version=0, max_version=5), (api_key=4, min_version=0, max_version=1), (api_key=5, min_version=0, max_version=0), (api_key=6, min_version=0, max_version=4), (api_key=7, min_version=0, max_version=1), (api_key=8, min_version=0, max_version=3), (api_key=9, min_version=0, max_version=3), (api_key=10, min_version=0, max_version=1), (api_key=11, min_version=0, max_version=2), (api_key=12, min_version=0, max_version=1), (api_key=13, min_version=0, max_version=1), (api_key=14, min_version=0, max_version=1), (api_key=15, min_version=0, max_version=1), (api_key=16, min_version=0, max_version=1), (api_key=17, min_version=0, max_version=1), (api_key=18, min_version=0, max_version=1), (api_key=19, min_version=0, max_version=2), (api_key=20, min_version=0, max_version=1), (api_key=21, min_version=0, max_version=0), (api_key=22, min_version=0, max_version=0), (api_key=23, min_version=0, max_version=0), (api_key=24, min_version=0, max_version=0), (api_key=25, min_version=0, max_version=0), (api_key=26, min_version=0, max_version=0), (api_key=27, min_version=0, max_version=0), (api_key=28, min_version=0, max_version=0), (api_key=29, min_version=0, max_version=0), (api_key=30, min_version=0, max_version=0), (api_key=31, min_version=0, max_version=0), (api_key=32, min_version=0, max_version=0), (api_key=33, min_version=0, max_version=0), (api_key=34, min_version=0, max_version=0), (api_key=35, min_version=0, max_version=0), (api_key=36, min_version=0, max_version=0), (api_key=37, min_version=0, max_version=0)])
[2018-11-21 12:16:42,212: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,212: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,212: DEBUG]: Kafka producer started
[2018-11-21 12:16:42,212: DEBUG]: [^--Producer]: Started.
[2018-11-21 12:16:42,212: INFO]: [^--Consumer]: Starting...
[2018-11-21 12:16:42,212: DEBUG]: Attempting to bootstrap via node at localhost:9094
[2018-11-21 12:16:42,214: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,215: DEBUG]: Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,215: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,215: DEBUG]: Received cluster metadata: ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,215: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,217: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: ApiVersionResponse_v0(error_code=0, api_versions=[(api_key=0, min_version=0, max_version=5), (api_key=1, min_version=0, max_version=6), (api_key=2, min_version=0, max_version=2), (api_key=3, min_version=0, max_version=5), (api_key=4, min_version=0, max_version=1), (api_key=5, min_version=0, max_version=0), (api_key=6, min_version=0, max_version=4), (api_key=7, min_version=0, max_version=1), (api_key=8, min_version=0, max_version=3), (api_key=9, min_version=0, max_version=3), (api_key=10, min_version=0, max_version=1), (api_key=11, min_version=0, max_version=2), (api_key=12, min_version=0, max_version=1), (api_key=13, min_version=0, max_version=1), (api_key=14, min_version=0, max_version=1), (api_key=15, min_version=0, max_version=1), (api_key=16, min_version=0, max_version=1), (api_key=17, min_version=0, max_version=1), (api_key=18, min_version=0, max_version=1), (api_key=19, min_version=0, max_version=2), (api_key=20, min_version=0, max_version=1), (api_key=21, min_version=0, max_version=0), (api_key=22, min_version=0, max_version=0), (api_key=23, min_version=0, max_version=0), (api_key=24, min_version=0, max_version=0), (api_key=25, min_version=0, max_version=0), (api_key=26, min_version=0, max_version=0), (api_key=27, min_version=0, max_version=0), (api_key=28, min_version=0, max_version=0), (api_key=29, min_version=0, max_version=0), (api_key=30, min_version=0, max_version=0), (api_key=31, min_version=0, max_version=0), (api_key=32, min_version=0, max_version=0), (api_key=33, min_version=0, max_version=0), (api_key=34, min_version=0, max_version=0), (api_key=35, min_version=0, max_version=0), (api_key=36, min_version=0, max_version=0), (api_key=37, min_version=0, max_version=0)])
[2018-11-21 12:16:42,218: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,218: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,219: DEBUG]: [^--Consumer]: Started.
[2018-11-21 12:16:42,219: INFO]: [^--LeaderAssignor]: Starting...
[2018-11-21 12:16:42,219: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2018-11-21 12:16:42,219: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,222: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:42,222: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:42,761: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-__assignor-__leader', error_code=0, error_message=None)])
[2018-11-21 12:16:42,761: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created.
[2018-11-21 12:16:42,761: DEBUG]: [^--LeaderAssignor]: Started.
[2018-11-21 12:16:42,761: INFO]: [^--ReplyConsumer]: Starting...
[2018-11-21 12:16:42,761: DEBUG]: [^--ReplyConsumer]: Started.
[2018-11-21 12:16:42,761: INFO]: [^AgentManager]: Starting...
[2018-11-21 12:16:42,762: INFO]: [^Agent: api.app.myagent]: Starting...
[2018-11-21 12:16:42,763: INFO]: [^-OneForOneSupervisor]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^-Agent*: api.app.myagent]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^-Agent*: api.app.myagent]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^-OneForOneSupervisor]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^Agent: api.app.myagent]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^AgentManager]: Started.
[2018-11-21 12:16:42,763: INFO]: [^--Conductor]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^--Conductor]: Started.
[2018-11-21 12:16:42,763: INFO]: [^--TableManager]: Starting...
[2018-11-21 12:16:42,764: DEBUG]: [^--Stream: Topic: topic1]: Starting...
[2018-11-21 12:16:42,764: DEBUG]: [^--Stream: Topic: topic1]: Started.
[2018-11-21 12:16:43,765: INFO]: [^--Table: topic1]: Starting...
[2018-11-21 12:16:43,766: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2018-11-21 12:16:43,771: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 3: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:43,772: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:44,766: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2018-11-21 12:16:45,165: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 4: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-topic1-changelog', error_code=0, error_message=None)])
[2018-11-21 12:16:45,166: INFO]: [^--Producer]: Topic myapp-topic1-changelog created.
[2018-11-21 12:16:45,166: DEBUG]: [^--Table: topic1]: Started.
[2018-11-21 12:16:45,166: INFO]: [^--Table: topic2]: Starting...
[2018-11-21 12:16:45,166: INFO]: [^--Producer]: Creating topic myapp-topic2-changelog
[2018-11-21 12:16:45,167: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 5: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:45,167: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:45,167: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 6: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:45,168: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:45,173: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 7: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-__assignor-__leader', error_code=36, error_message="Topic 'myapp-__assignor-__leader' already exists.")])
[2018-11-21 12:16:45,173: DEBUG]: [^--Producer]: Topic myapp-__assignor-__leader exists, skipping creation.
[2018-11-21 12:16:46,494: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 8: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-topic2-changelog', error_code=0, error_message=None)])
[2018-11-21 12:16:46,494: INFO]: [^--Producer]: Topic myapp-topic2-changelog created.
[2018-11-21 12:16:46,494: DEBUG]: [^--Table: topic2]: Started.
[2018-11-21 12:16:46,495: INFO]: [^---Recovery]: Starting...
[2018-11-21 12:16:46,495: DEBUG]: [^---Recovery]: Started.
[2018-11-21 12:16:46,495: DEBUG]: [^--TableManager]: Started.
[2018-11-21 12:16:46,495: DEBUG]: [^-App]: Started.
[2018-11-21 12:16:46,495: INFO]: [^---Recovery]: WAITING FOR NEXT RECOVERY TO START
[2018-11-21 12:17:26,124: INFO]: [^Worker]: Stopping on signal received...
[2018-11-21 12:17:26,124: INFO]: [^Worker]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^-App]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^-App]: Flush producer buffer...
[2018-11-21 12:17:26,124: INFO]: [^--TableManager]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^--Fetcher]: Stopping...
[2018-11-21 12:17:26,125: DEBUG]: [^--Fetcher]: Shutting down...
[2018-11-21 12:17:26,125: INFO]: [^--Fetcher]: -Stopped!
[2018-11-21 12:17:26,125: INFO]: [^---Recovery]: Stopping...
[2018-11-21 12:17:26,125: DEBUG]: [^---Recovery]: Shutting down...
[2018-11-21 12:17:26,125: DEBUG]: [^Worker]: Started.
[2018-11-21 12:17:26,126: INFO]: [^---Recovery]: Terminating cancelled task: <coroutine object Recovery._slurp_changelogs at 0x7f12afaac448>
[2018-11-21 12:17:26,126: INFO]: [^Worker]: Gathering service tasks...
[2018-11-21 12:17:26,126: INFO]: [^Worker]: Gathering all futures...
[2018-11-21 12:17:26,127: INFO]: [^---Recovery]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic1]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Table: topic1]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic1]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic2]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Table: topic2]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic2]: -Stopped!
[2018-11-21 12:17:26,127: DEBUG]: [^--TableManager]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--TableManager]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Conductor]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Conductor]: Shutting down...
[2018-11-21 12:17:26,128: INFO]: [^--Conductor]: -Stopped!
[2018-11-21 12:17:26,128: INFO]: [^AgentManager]: Stopping...
[2018-11-21 12:17:26,128: INFO]: [^Agent: api.app.myagent]: Stopping...
[2018-11-21 12:17:26,128: INFO]: [^-OneForOneSupervisor]: Stopping...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Stopping...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Shutting down...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Terminating cancelled task: <Task cancelled coro=<Agent._execute_task() done, defined at /var/lib/laminar/run/ameto/2277/ameto/api/venv/lib/python3.7/site-packages/faust/agents/agent.py:536>>
[2018-11-21 12:17:26,129: DEBUG]: [^-Agent*: api.app.myagent]: -Stopped!
[2018-11-21 12:17:26,129: DEBUG]: [^-OneForOneSupervisor]: Shutting down...
[2018-11-21 12:17:26,129: INFO]: [^-OneForOneSupervisor]: Terminating cancelled task: <coroutine object SupervisorStrategy._supervisor at 0x7f12afa93448>
[2018-11-21 12:17:26,129: INFO]: [^-OneForOneSupervisor]: -Stopped!
[2018-11-21 12:17:26,130: DEBUG]: [^Agent: api.app.myagent]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^Agent: api.app.myagent]: -Stopped!
[2018-11-21 12:17:26,130: DEBUG]: [^AgentManager]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^AgentManager]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--ReplyConsumer]: Stopping...
[2018-11-21 12:17:26,130: DEBUG]: [^--ReplyConsumer]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^--ReplyConsumer]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--LeaderAssignor]: Stopping...
[2018-11-21 12:17:26,130: DEBUG]: [^--LeaderAssignor]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^--LeaderAssignor]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--Consumer]: Stopping...
[2018-11-21 12:17:26,131: INFO]: [^--Consumer]: COMMITTING AGAIN AFTER STREAMS DONE
[2018-11-21 12:17:26,131: DEBUG]: Closing the KafkaConsumer.
[2018-11-21 12:17:26,131: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:17:26,131: DEBUG]: The KafkaConsumer has closed.
[2018-11-21 12:17:26,131: DEBUG]: [^--Consumer]: Shutting down...
[2018-11-21 12:17:26,131: INFO]: [^--Consumer]: -Stopped!
[2018-11-21 12:17:26,132: INFO]: [^--Producer]: Stopping...
[2018-11-21 12:17:26,132: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:17:26,132: DEBUG]: The Kafka producer has closed.
[2018-11-21 12:17:26,132: DEBUG]: [^--Producer]: Shutting down...
[2018-11-21 12:17:26,132: INFO]: [^--Producer]: -Stopped!
[2018-11-21 12:17:26,132: INFO]: [^--Monitor]: Stopping...
[2018-11-21 12:17:26,132: DEBUG]: [^--Monitor]: Shutting down...
[2018-11-21 12:17:26,132: INFO]: [^--Monitor]: -Stopped!
[2018-11-21 12:17:26,133: DEBUG]: [^-App]: Shutting down...
[2018-11-21 12:17:26,133: INFO]: [^-App]: -Stopped!
[2018-11-21 12:17:26,133: DEBUG]: [^Worker]: Shutting down...
[2018-11-21 12:17:26,133: INFO]: [^Worker]: -Stopped!
[2018-11-21 12:17:27,230: INFO]: [^Worker]: Closing event loop
Versions
- Python version: 3.6.6 and 3.7
- Faust version: 1.3.2
- Operating system: Linux
- Kafka version: 1.0.1
- RocksDB version (if applicable)
Can you try setting:
App(stream_wait_empty=False, ...)
?
Adding stream_wait_empty=False
does not make a difference.
However, it does work with Kafka-2.1.0. Could this be an issue of the AdminClient implementation or a Kafka bug?
Maybe I dismissed the issue too quickly: As soon as I add more topics and changelog topics to the example code above the application starts hanging again during initialization, even with Kafka-2.1.0. Right now, I am testing with four topics and four tables.
A git bisect states that aa15f93bb03a8631d0e96a95b8b9a07668ae191f is the first bad commit.
Also: When all topics and changelog topics are already present in the Kafka cluster and faust does not has to create them, the application starts up fine. Only when faust has to create the topics, the app initialization will hang.
Hmm. When I reset the state of my local Kafka it still works, but maybe it depends on version
topic2 is unused by any agent, so not sure how it would work when you uncomment that line? table1+table2 will have an effect on startup, but does it really work if you uncomment one of the tables? (is having two tables necessary to reproduce?)
Here's what happens when the topics do not exist for me:
(def) /opt/devel/faust(master)$> python examples/withdrawals.py worker -l info
┌ƒaµS† v1.4.2────────────────────────────────────────────┐
│ id │ faust-withdrawals-v3 │
│ transport │ [URL('aiokafka:')] │
│ store │ rocksdb: │
│ web │ http://localhost:6066/ │
│ log │ -stderr- (info) │
│ pid │ 75061 │
│ hostname │ grainstate.local │
│ platform │ CPython 3.7.0 (Darwin x86_64) │
│ drivers │ aiokafka=0.5.1 aiohttp=3.5.2 │
│ datadir │ /opt/devel/faust/faust-withdrawals-data │
│ appdir │ /opt/devel/faust/faust-withdrawals-data/v3 │
└───────────┴────────────────────────────────────────────┘
[2019-01-14 16:47:42,159: INFO]: [^Worker]: Starting...
[2019-01-14 16:47:42,162: INFO]: [^-App]: Starting...
[2019-01-14 16:47:42,162: INFO]: [^--Monitor]: Starting...
[2019-01-14 16:47:42,163: INFO]: [^--Producer]: Starting...
[2019-01-14 16:47:42,176: INFO]: [^CacheBackend]: Starting...
[2019-01-14 16:47:42,176: INFO]: [^Web]: Starting...
[2019-01-14 16:47:42,189: INFO]: [^-Server]: Starting...
[2019-01-14 16:47:42,190: INFO]: [^--Consumer]: Starting...
[2019-01-14 16:47:42,190: INFO]: [^---MethodQueue]: Starting...
[2019-01-14 16:47:42,191: INFO]: [^---AIOKafkaConsumerThread]: Starting...
[2019-01-14 16:47:42,197: INFO]: [^----MethodQueue]: Starting...
[2019-01-14 16:47:43,164: INFO]: [^--LeaderAssignor]: Starting...
[2019-01-14 16:47:43,164: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-__assignor-__leader
[2019-01-14 16:47:43,168: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:43,328: INFO]: [^--Producer]: Topic faust-withdrawals-v3-__assignor-__leader created.
[2019-01-14 16:47:43,329: INFO]: [^--ReplyConsumer]: Starting...
[2019-01-14 16:47:43,329: INFO]: [^AgentManager]: Starting...
[2019-01-14 16:47:43,329: INFO]: [^Agent: __main__.track_user_withdrawal]: Starting...
[2019-01-14 16:47:43,330: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-01-14 16:47:43,331: INFO]: [^Agent: __mai[.]track_country_withdrawal]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^--Conductor]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^--TableManager]: Starting...
[2019-01-14 16:47:43,333: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition
[2019-01-14 16:47:43,334: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:43,390: INFO]: [^--Producer]: Topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition created.
[2019-01-14 16:47:44,334: INFO]: [^--Table: user_to_total]: Starting...
[2019-01-14 16:47:44,427: INFO]: [^---Store: user_to_total]: Starting...
[2019-01-14 16:47:44,428: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-user_to_total-changelog
[2019-01-14 16:47:44,430: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:44,501: INFO]: [^--Producer]: Topic faust-withdrawals-v3-user_to_total-changelog created.
[2019-01-14 16:47:44,502: INFO]: [^--Table: country_to_total]: Starting...
[2019-01-14 16:47:44,502: INFO]: [^---Store: country_to_total]: Starting...
[2019-01-14 16:47:44,503: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-country_to_total-changelog
[2019-01-14 16:47:44,504: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:44,565: INFO]: [^--Producer]: Topic faust-withdrawals-v3-country_to_total-changelog created.
[2019-01-14 16:47:44,566: INFO]: [^---Recovery]: Starting...
[2019-01-14 16:47:45,334: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal
[2019-01-14 16:47:45,337: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,396: INFO]: [^--Producer]: Topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal created.
[2019-01-14 16:47:45,396: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition
[2019-01-14 16:47:45,397: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,406: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-__assignor-__leader
[2019-01-14 16:47:45,407: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,410: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-country_to_total-changelog
[2019-01-14 16:47:45,412: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,415: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-user_to_total-changelog
[2019-01-14 16:47:45,416: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:46,200: INFO]: Updating subscribed topics to: frozenset({'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', 'faust-withdrawals-v3-user_to_total-changelog', 'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', 'faust-withdrawals-v3-__assignor-__leader', 'faust-withdrawals-v3-country_to_total-changelog', 'withdrawals'})
[2019-01-14 16:47:46,201: INFO]: Subscribed to topic(s): {'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', 'faust-withdrawals-v3-user_to_total-changelog', 'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', 'faust-withdrawals-v3-__assignor-__leader', 'faust-withdrawals-v3-country_to_total-changelog', 'withdrawals'}
[2019-01-14 16:47:46,229: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,250: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,356: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,362: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,467: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,473: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,580: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,585: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,692: INFO]: Discovered coordinator 0 for group faust-withdrawals-v3
[2019-01-14 16:47:46,693: INFO]: Revoking previously assigned partitions set() for group faust-withdrawals-v3
[2019-01-14 16:47:47,201: INFO]: (Re-)joining group faust-withdrawals-v3
[2019-01-14 16:47:47,248: INFO]: Joined group 'faust-withdrawals-v3' (generation 1) with member_id faust-1.4.2-860504a6-a071-4a95-8591-9f3f34e0ff8b
[2019-01-14 16:47:47,248: INFO]: Elected group leader -- performing partition assignments using faust
[2019-01-14 16:47:47,249: WARNING]: Ignoring missing topic: withdrawals
[2019-01-14 16:47:47,294: INFO]: Successfully synced group faust-withdrawals-v3 with generation 1
[2019-01-14 16:47:47,294: ERROR]: Rejoining group -- Need to rejoin! -- Topics not yet created: {'withdrawals'}
[2019-01-14 16:47:47,297: INFO]: (Re-)joining group faust-withdrawals-v3
[2019-01-14 16:47:47,301: INFO]: Joined group 'faust-withdrawals-v3' (generation 2) with member_id faust-1.4.2-860504a6-a071-4a95-8591-9f3f34e0ff8b
[2019-01-14 16:47:47,301: INFO]: Elected group leader -- performing partition assignments using faust
[2019-01-14 16:47:47,304: INFO]: Successfully synced group faust-withdrawals-v3 with generation 2
[2019-01-14 16:47:47,305: INFO]: Setting newly assigned partitions {TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=3), TopicPartition(topic='withdrawals', partition=3), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=2), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=0), TopicPartition(topic='faust-withdrawals-v3-__assignor-__leader', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=1), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=1), TopicPartition(topic='withdrawals', partition=1), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=2), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=3), TopicPartition(topic='withdrawals', partition=2), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=2), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=0), TopicPartition(topic='withdrawals', partition=0), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=1), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=3), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=2), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=1), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=3)} for group faust-withdrawals-v3
[2019-01-14 16:47:52,194: INFO]: [^---Recovery]: Highwater for active changelog partitions:
┌Highwater - Active───────────────────────────────┬───────────┬───────────┐
│ topic │ partition │ highwater │
├─────────────────────────────────────────────────┼───────────┼───────────┤
│ faust-withdrawals-v3-user_to_total-changelog │ 0 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 1 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 2 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 3 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 0 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 1 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 2 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 3 │ -1 │
└─────────────────────────────────────────────────┴───────────┴───────────┘
[2019-01-14 16:47:53,510: INFO]: [^---Recovery]: active offsets at start of reading:
┌Reading Starts At - Active───────────────────────┬───────────┬────────┐
│ topic │ partition │ offset │
├─────────────────────────────────────────────────┼───────────┼────────┤
│ faust-withdrawals-v3-country_to_total-changelog │ 2 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 2 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 1 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 1 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 0 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 0 │ -1 │
│ faust-withdrawals-v3-country_to_total-changelog │ 3 │ -1 │
│ faust-withdrawals-v3-user_to_total-changelog │ 3 │ -1 │
└─────────────────────────────────────────────────┴───────────┴────────┘
[2019-01-14 16:47:54,201: INFO]: [^---Recovery]: standby offsets at start of reading:
┌Reading Starts At - Standby─┐
│ topic │ partition │ offset │
└───────┴───────────┴────────┘
[2019-01-14 16:47:54,569: INFO]: [^---Recovery]: Resuming flow...
[2019-01-14 16:47:54,569: INFO]: [^---Recovery]: Recovery complete
[2019-01-14 16:47:54,670: INFO]: [^---Recovery]: Restore complete!
[2019-01-14 16:47:54,671: INFO]: [^---Recovery]: Seek stream partitions to committed offsets.
[2019-01-14 16:47:55,203: INFO]: [^--Fetcher]: Starting...
[2019-01-14 16:47:55,203: INFO]: [^---Recovery]: Worker ready
[2019-01-14 16:47:55,204: INFO]: [^Worker]: Ready```
I finally could come back to this and tested it with faust-1.4.8 and faust-1.4.9. It is still reproducible in both versions.
The second table is necessary to reproduce the issue. With the second table, I am getting the following log output:
$ faust --loglevel info --app app.app worker --without-web
┌ƒaµS† v1.4.9──────────────────────────────────────────┐
│ id │ myapp │
│ transport │ [URL('kafka://localhost:9092')] │
│ store │ memory: │
│ log │ -stderr- (info) │
│ pid │ 28700 │
│ hostname │ webserver │
│ platform │ CPython 3.6.8 (Linux x86_64) │
│ drivers │ aiokafka=0.4.19 aiohttp=3.5.4 │
│ datadir │ /tmp/faust-test/myapp-data │
│ appdir │ /tmp/faust-test/myapp-data/v1 │
└───────────┴──────────────────────────────────────────┘
[2019-03-21 11:32:19,845: INFO]: [^Worker]: Starting...
[2019-03-21 11:32:19,847: INFO]: [^-App]: Starting...
[2019-03-21 11:32:19,847: INFO]: [^--Monitor]: Starting...
[2019-03-21 11:32:19,848: INFO]: [^--Producer]: Starting...
[2019-03-21 11:32:19,870: INFO]: [^--Consumer]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^---MethodQueue]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^---ConsumerThread]: Starting...
[2019-03-21 11:32:19,879: INFO]: [^----MethodQueue]: Starting...
[2019-03-21 11:32:19,880: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,880: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 11:32:20,850: INFO]: [^--LeaderAssignor]: Starting...
[2019-03-21 11:32:20,852: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 11:32:20,862: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:21,306: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created.
[2019-03-21 11:32:21,306: INFO]: [^--ReplyConsumer]: Starting...
[2019-03-21 11:32:21,306: INFO]: [^--AgentManager]: Starting...
[2019-03-21 11:32:21,307: INFO]: [^Agent: app.myagent]: Starting...
[2019-03-21 11:32:21,311: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-03-21 11:32:21,311: INFO]: [^--Conductor]: Starting...
[2019-03-21 11:32:21,312: INFO]: [^--TableManager]: Starting...
[2019-03-21 11:32:22,314: INFO]: [^--Table: topic1]: Starting...
[2019-03-21 11:32:22,318: INFO]: [^---Store: topic1]: Starting...
[2019-03-21 11:32:22,319: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2019-03-21 11:32:22,325: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:23,405: INFO]: [^--Producer]: Topic myapp-topic1-changelog created.
[2019-03-21 11:32:23,406: INFO]: [^--Table: topic2]: Starting...
[2019-03-21 11:32:23,407: INFO]: [^---Store: topic2]: Starting...
[2019-03-21 11:32:23,408: INFO]: [^--Producer]: Creating topic myapp-topic2-changelog
[2019-03-21 11:32:23,408: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 11:32:23,410: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:23,412: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:24,475: INFO]: [^--Producer]: Topic myapp-topic2-changelog created.
[2019-03-21 11:32:24,476: INFO]: [^---Recovery]: Starting...
If I comment out topic2_table
, the worker starts sucessfully:
$ faust --loglevel info --app app.app worker --without-web
┌ƒaµS† v1.4.9──────────────────────────────────────────┐
│ id │ myapp │
│ transport │ [URL('kafka://localhost:9092')] │
│ store │ memory: │
│ log │ -stderr- (info) │
│ pid │ 19870 │
│ hostname │ webserver │
│ platform │ CPython 3.6.8 (Linux x86_64) │
│ drivers │ aiokafka=0.4.19 aiohttp=3.5.4 │
│ datadir │ /tmp/faust-test/myapp-data │
│ appdir │ /tmp/faust-test/myapp-data/v1 │
└───────────┴──────────────────────────────────────────┘
[2019-03-21 12:03:40,048: INFO]: [^Worker]: Starting...
[2019-03-21 12:03:40,050: INFO]: [^-App]: Starting...
[2019-03-21 12:03:40,050: INFO]: [^--Monitor]: Starting...
[2019-03-21 12:03:40,050: INFO]: [^--Producer]: Starting...
[2019-03-21 12:03:40,074: INFO]: [^--Consumer]: Starting...
[2019-03-21 12:03:40,074: INFO]: [^---MethodQueue]: Starting...
[2019-03-21 12:03:40,074: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 12:03:40,074: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 12:03:40,075: INFO]: [^---ConsumerThread]: Starting...
[2019-03-21 12:03:40,090: INFO]: [^----MethodQueue]: Starting...
[2019-03-21 12:03:40,090: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 12:03:40,091: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 12:03:41,052: INFO]: [^--LeaderAssignor]: Starting...
[2019-03-21 12:03:41,054: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 12:03:41,061: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 12:03:41,445: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created.
[2019-03-21 12:03:41,446: INFO]: [^--ReplyConsumer]: Starting...
[2019-03-21 12:03:41,446: INFO]: [^--AgentManager]: Starting...
[2019-03-21 12:03:41,446: INFO]: [^Agent: app.myagent]: Starting...
[2019-03-21 12:03:41,451: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-03-21 12:03:41,452: INFO]: [^--Conductor]: Starting...
[2019-03-21 12:03:41,452: INFO]: [^--TableManager]: Starting...
[2019-03-21 12:03:42,454: INFO]: [^--Table: topic1]: Starting...
[2019-03-21 12:03:42,457: INFO]: [^---Store: topic1]: Starting...
[2019-03-21 12:03:42,458: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2019-03-21 12:03:42,462: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 12:03:43,453: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 12:03:43,495: INFO]: [^--Producer]: Topic myapp-topic1-changelog created.
[2019-03-21 12:03:43,496: INFO]: [^---Recovery]: Starting...
[2019-03-21 12:03:43,498: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 12:03:43,508: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2019-03-21 12:03:43,510: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 12:03:44,096: INFO]: Updating subscribed topics to: frozenset({'myapp-topic1-changelog', 'myapp-__assignor-__leader', 'topic1'})
[2019-03-21 12:03:44,101: INFO]: Subscribed to topic(s): {'myapp-topic1-changelog', 'myapp-__assignor-__leader', 'topic1'}
[2019-03-21 12:03:44,221: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:44,570: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:44,953: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,140: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,247: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,355: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,465: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,575: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,684: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,792: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:45,899: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,006: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,115: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,223: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,333: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,447: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,554: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,659: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,767: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,875: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:46,981: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,086: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,196: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,303: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,409: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,514: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,622: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,729: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,835: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:47,940: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,046: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,157: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,264: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,374: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,484: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,595: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,703: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,809: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:48,914: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,018: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,123: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,228: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,335: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,442: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError
[2019-03-21 12:03:49,549: INFO]: Discovered coordinator 1 for group myapp
[2019-03-21 12:03:49,551: INFO]: Revoking previously assigned partitions set() for group myapp
[2019-03-21 12:03:50,102: INFO]: (Re-)joining group myapp
[2019-03-21 12:03:50,129: INFO]: Joined group 'myapp' (generation 1) with member_id faust-1.4.9-d51e208c-59e3-4962-8221-8854d7c73beb
[2019-03-21 12:03:50,129: INFO]: Elected group leader -- performing partition assignments using faust
[2019-03-21 12:03:50,188: INFO]: Successfully synced group myapp with generation 1
[2019-03-21 12:03:50,189: INFO]: Setting newly assigned partitions
┌Topic Partition Set────────┬──────────────────────────┐
│ topic │ partitions │
├───────────────────────────┼──────────────────────────┤
│ myapp-topic1-changelog │ {0, 1, 2, 3, 4, 5, 6, 7} │
│ topic1 │ {0} │
│ myapp-__assignor-__leader │ {0} │
└───────────────────────────┴──────────────────────────┘ for group myapp
[2019-03-21 12:03:55,058: INFO]: [^---Recovery]: Highwater for active changelog partitions:
┌Highwater - Active──────┬───────────┬───────────┐
│ topic │ partition │ highwater │
├────────────────────────┼───────────┼───────────┤
│ myapp-topic1-changelog │ 0 │ -1 │
│ myapp-topic1-changelog │ 1 │ -1 │
│ myapp-topic1-changelog │ 2 │ -1 │
│ myapp-topic1-changelog │ 3 │ -1 │
│ myapp-topic1-changelog │ 4 │ -1 │
│ myapp-topic1-changelog │ 5 │ -1 │
│ myapp-topic1-changelog │ 6 │ -1 │
│ myapp-topic1-changelog │ 7 │ -1 │
└────────────────────────┴───────────┴───────────┘
[2019-03-21 12:03:56,456: INFO]: [^---Recovery]: active offsets at start of reading:
┌Reading Starts At - Active──────────┬────────┐
│ topic │ partition │ offset │
├────────────────────────┼───────────┼────────┤
│ myapp-topic1-changelog │ 0 │ -1 │
│ myapp-topic1-changelog │ 1 │ -1 │
│ myapp-topic1-changelog │ 2 │ -1 │
│ myapp-topic1-changelog │ 3 │ -1 │
│ myapp-topic1-changelog │ 4 │ -1 │
│ myapp-topic1-changelog │ 5 │ -1 │
│ myapp-topic1-changelog │ 6 │ -1 │
│ myapp-topic1-changelog │ 7 │ -1 │
└────────────────────────┴───────────┴────────┘
[2019-03-21 12:03:57,073: INFO]: [^---Recovery]: standby offsets at start of reading:
┌Reading Starts At - Standby─┐
│ topic │ partition │ offset │
└───────┴───────────┴────────┘
[2019-03-21 12:03:57,861: INFO]: [^---Recovery]: Resuming flow...
[2019-03-21 12:03:57,862: INFO]: [^---Recovery]: Recovery complete
[2019-03-21 12:03:57,963: INFO]: [^---Recovery]: Restore complete!
[2019-03-21 12:03:57,964: INFO]: [^---Recovery]: Seek stream partitions to committed offsets.
[2019-03-21 12:03:58,075: INFO]: [^--Fetcher]: Starting...
[2019-03-21 12:03:58,075: INFO]: [^---Recovery]: Worker ready
[2019-03-21 12:03:58,077: INFO]: [^Worker]: Ready
The underlying storage implementation for the table does not seem to be involved. I am experiencing the same behaviour as in my original post using the rocksdb
store, instead of memory
.
I created a gist that contains a file for docker-compose which uses the exact same versions of Zookeeper and Kafka as I am using. https://gist.github.com/seifertm/10cf65a58b08d9301382d5014fdc9121
also having this issue. I have tried different versions of kafka to no avail. it only happens on the first start up of faust when kafka logs are empty. in debugging though, we clear kafka logs and faust data frequently. which means we have to do a start & restart after every data purge.
tested with versions faust 1.4.x -> 1.5.4
Did some more testing on this. Apparently it is not necessary to clear the faust data, just the kafka logs. My suspicion is that it is related to timing (i.e. if creating the initial topics takes too long before getting to the recovery step, it will hang). I was able to recreate this from the example app in the integration tests.
Working with faust version 1.5.4, clearing kafka out, and running this file with worker
command causes it to get stuck at recovery step:
from typing import List
import faust
class Point(faust.Record):
x: int
y: int
class Arena(faust.Record):
points: List[Point]
timestamp: float = None
app = faust.App('t-integration', origin='t.integration.app',
broker='kafka://host.docker.internal:31101')
add_topic = app.topic('add-topic', internal=True)
add_table1 = app.Table('add-table', partitions=1)
add_table2 = app.Table('add-table2')
add_table3 = app.Table('add-table3')
add_table4 = app.Table('add-table4')
add_table5 = app.Table('add-table5')
add_table6 = app.Table('add-table6')
local_channel = app.channel()
@app.agent()
async def mul(stream):
"""Foo agent help."""
async for event in stream:
yield event * 2
@app.agent(add_topic)
async def add(stream):
async for event in stream:
yield event + 2
@app.agent(local_channel)
async def internal(stream):
async for event in stream:
yield event / 2
@app.command()
async def error_command():
raise Exception('foo')
if __name__ == '__main__':
app.main()
It seems only to be determined by the number of tables, and not the number of topics. Doing this same thing, with only one table and many topics seems to work. If it doesn't hang with the example I gave above, make sure you clear kafka logs before trying. If it still works, my guess would be because machine running it is faster than the machine I'm running it on. Try adding more example tables.
A lot have changed since 1.5.4, so this could be fixed already.
@amerski99 I tried running your example a few times but unable to reproduce. I did not have any data in the topics, so maybe that's why. Also possible this is simply fixed in the latest version OR it's a kafka server version specific issue.
@ask This issue is still happening in latest code version (v1.7.0a1). Not having data in any of the topics is ok (in fact, it's essential).
This issue only happens the first time you start faust up when kafka is empty. My suspicion is that the first time it starts up, the producer takes longer to create the initial topics, and the order of things gets screwed up. Probably one step is dependent on another step, without a dependency requirement, that usually happens in order, but does not if there are too many topics to create. My evidence for this is, if there are less tables, it does not happen. 6 was the number of tables it took for it to happen on my machine. If my theory is correct, having a faster machine might change the number of tables required. So if it's still not happening for you with 6 tables, try adding more. Again, make sure it is empty kafka server before trying.
kafka version I'm using is the docker image wurstmeister/kafka:2.12-2.2.0
. I've tried other versions as well. If you are still not able to recreate, let me know what kafka version you are using and I can try that as well.
Is there any update on this issue? I am using Faust 1.7.1 and see - [2019-10-22 00:55:49,549: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError The issue occurs when Kafka was not in good state, But problem is that Faust APP never recovers and I have to manually restart it. I believe there could be ways to handle these errors in my APP and take recovery action. Please do let me know if there are any ideas on how to handle it.
Further asking - a) Does on_error() callback will be invoked frro above error? b) Shouldn;t faust App recover from the CoordinatorNotAvailableError on its own? If yes, any specific settings required?
Any ways to catch this error in my APP and write a handler?
@amitkg29 Hi, I also meet the same issue using the Faust app. Did you have any idea to resolve the issue? Thanks.
I'm unable to run integration tests on my consumers because I always get this error when I start a worker for the first time.
Do you have any news on this ?
I also have the same issue, but I did not get it on the first time the worker started, but mine started when I stopped kafka, and then sent the agent some data and then restarted faust app after kafka cluster started.
Same error for me.
It's a bit late but it could help someone else.
This is caused by a desync between Kafka and Zookeeper. When you destroy your kafka, there is still topic information in Zookeeper. To solve this, you can connect to your ZK and execute a cleanup: deleteall /brokers
. Be careful this will erase all kafka synchronisation config.