faust icon indicating copy to clipboard operation
faust copied to clipboard

Faust app hangs waiting for recovery

Open seifertm opened this issue 6 years ago • 18 comments

I am experiencing a problem that prevents my Faust app from starting properly. The issue occurs with Python 3.6 and Python 3.7 but only on my CI server. I cannot reproduce the issue on my working machine.

Any ideas how to get to the ground of this?

Checklist

  • [x] I have included information about relevant versions
  • [ ] I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

  • Start a fresh Kafka instance (this is important. Subsequent runs will succeed.)
  • Run the minimal example with the provided commands

Here is a minimal example:

import faust

app = faust.App('myapp', broker='kafka://localhost:9094', store='memory://')

topic1 = app.topic('topic1')
topic2 = app.topic('topic2') # Comment this line to make the app start without hanging

topic1_table = app.Table('topic1')
topic2_table = app.Table('topic2') # Comment this line to make the app start without hanging

@app.agent(topic1)
async def myagent(stream):
    async for item in stream:
        print(item)

Start it with faust --loglevel debug --debug --app api.app worker --without-web.

Expected behavior

App starts normally.

Actual behavior

App hangs when starting.

Full traceback

[2018-11-21 12:16:42,195: WARNING]: [^Worker]: Cannot start console: aiomonitor is not installed
[2018-11-21 12:16:42,195: INFO]: [^Worker]: Starting...
[2018-11-21 12:16:42,196: INFO]: [^-App]: Starting...
[2018-11-21 12:16:42,196: INFO]: [^--Monitor]: Starting...
[2018-11-21 12:16:42,196: DEBUG]: [^--Monitor]: Started.
[2018-11-21 12:16:42,196: INFO]: [^--Producer]: Starting...
[2018-11-21 12:16:42,196: DEBUG]: Starting the Kafka producer
[2018-11-21 12:16:42,196: DEBUG]: Attempting to bootstrap via node at localhost:9094
[2018-11-21 12:16:42,205: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,205: DEBUG]: Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,206: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,206: DEBUG]: Received cluster metadata: ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,206: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,210: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: ApiVersionResponse_v0(error_code=0, api_versions=[(api_key=0, min_version=0, max_version=5), (api_key=1, min_version=0, max_version=6), (api_key=2, min_version=0, max_version=2), (api_key=3, min_version=0, max_version=5), (api_key=4, min_version=0, max_version=1), (api_key=5, min_version=0, max_version=0), (api_key=6, min_version=0, max_version=4), (api_key=7, min_version=0, max_version=1), (api_key=8, min_version=0, max_version=3), (api_key=9, min_version=0, max_version=3), (api_key=10, min_version=0, max_version=1), (api_key=11, min_version=0, max_version=2), (api_key=12, min_version=0, max_version=1), (api_key=13, min_version=0, max_version=1), (api_key=14, min_version=0, max_version=1), (api_key=15, min_version=0, max_version=1), (api_key=16, min_version=0, max_version=1), (api_key=17, min_version=0, max_version=1), (api_key=18, min_version=0, max_version=1), (api_key=19, min_version=0, max_version=2), (api_key=20, min_version=0, max_version=1), (api_key=21, min_version=0, max_version=0), (api_key=22, min_version=0, max_version=0), (api_key=23, min_version=0, max_version=0), (api_key=24, min_version=0, max_version=0), (api_key=25, min_version=0, max_version=0), (api_key=26, min_version=0, max_version=0), (api_key=27, min_version=0, max_version=0), (api_key=28, min_version=0, max_version=0), (api_key=29, min_version=0, max_version=0), (api_key=30, min_version=0, max_version=0), (api_key=31, min_version=0, max_version=0), (api_key=32, min_version=0, max_version=0), (api_key=33, min_version=0, max_version=0), (api_key=34, min_version=0, max_version=0), (api_key=35, min_version=0, max_version=0), (api_key=36, min_version=0, max_version=0), (api_key=37, min_version=0, max_version=0)])
[2018-11-21 12:16:42,212: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,212: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,212: DEBUG]: Kafka producer started
[2018-11-21 12:16:42,212: DEBUG]: [^--Producer]: Started.
[2018-11-21 12:16:42,212: INFO]: [^--Consumer]: Starting...
[2018-11-21 12:16:42,212: DEBUG]: Attempting to bootstrap via node at localhost:9094
[2018-11-21 12:16:42,214: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,215: DEBUG]: Updated cluster metadata to ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,215: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,215: DEBUG]: Received cluster metadata: ClusterMetadata(brokers: 1, topics: 0, groups: 0)
[2018-11-21 12:16:42,215: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,217: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: ApiVersionResponse_v0(error_code=0, api_versions=[(api_key=0, min_version=0, max_version=5), (api_key=1, min_version=0, max_version=6), (api_key=2, min_version=0, max_version=2), (api_key=3, min_version=0, max_version=5), (api_key=4, min_version=0, max_version=1), (api_key=5, min_version=0, max_version=0), (api_key=6, min_version=0, max_version=4), (api_key=7, min_version=0, max_version=1), (api_key=8, min_version=0, max_version=3), (api_key=9, min_version=0, max_version=3), (api_key=10, min_version=0, max_version=1), (api_key=11, min_version=0, max_version=2), (api_key=12, min_version=0, max_version=1), (api_key=13, min_version=0, max_version=1), (api_key=14, min_version=0, max_version=1), (api_key=15, min_version=0, max_version=1), (api_key=16, min_version=0, max_version=1), (api_key=17, min_version=0, max_version=1), (api_key=18, min_version=0, max_version=1), (api_key=19, min_version=0, max_version=2), (api_key=20, min_version=0, max_version=1), (api_key=21, min_version=0, max_version=0), (api_key=22, min_version=0, max_version=0), (api_key=23, min_version=0, max_version=0), (api_key=24, min_version=0, max_version=0), (api_key=25, min_version=0, max_version=0), (api_key=26, min_version=0, max_version=0), (api_key=27, min_version=0, max_version=0), (api_key=28, min_version=0, max_version=0), (api_key=29, min_version=0, max_version=0), (api_key=30, min_version=0, max_version=0), (api_key=31, min_version=0, max_version=0), (api_key=32, min_version=0, max_version=0), (api_key=33, min_version=0, max_version=0), (api_key=34, min_version=0, max_version=0), (api_key=35, min_version=0, max_version=0), (api_key=36, min_version=0, max_version=0), (api_key=37, min_version=0, max_version=0)])
[2018-11-21 12:16:42,218: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: MetadataResponse_v0(brokers=[(node_id=1, host='localhost', port=9094)], topics=[])
[2018-11-21 12:16:42,218: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:16:42,219: DEBUG]: [^--Consumer]: Started.
[2018-11-21 12:16:42,219: INFO]: [^--LeaderAssignor]: Starting...
[2018-11-21 12:16:42,219: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2018-11-21 12:16:42,219: DEBUG]: Initiating connection to node 1 at localhost:9094
[2018-11-21 12:16:42,222: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 1: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:42,222: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:42,761: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 2: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-__assignor-__leader', error_code=0, error_message=None)])
[2018-11-21 12:16:42,761: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created.
[2018-11-21 12:16:42,761: DEBUG]: [^--LeaderAssignor]: Started.
[2018-11-21 12:16:42,761: INFO]: [^--ReplyConsumer]: Starting...
[2018-11-21 12:16:42,761: DEBUG]: [^--ReplyConsumer]: Started.
[2018-11-21 12:16:42,761: INFO]: [^AgentManager]: Starting...
[2018-11-21 12:16:42,762: INFO]: [^Agent: api.app.myagent]: Starting...
[2018-11-21 12:16:42,763: INFO]: [^-OneForOneSupervisor]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^-Agent*: api.app.myagent]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^-Agent*: api.app.myagent]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^-OneForOneSupervisor]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^Agent: api.app.myagent]: Started.
[2018-11-21 12:16:42,763: DEBUG]: [^AgentManager]: Started.
[2018-11-21 12:16:42,763: INFO]: [^--Conductor]: Starting...
[2018-11-21 12:16:42,763: DEBUG]: [^--Conductor]: Started.
[2018-11-21 12:16:42,763: INFO]: [^--TableManager]: Starting...
[2018-11-21 12:16:42,764: DEBUG]: [^--Stream: Topic: topic1]: Starting...
[2018-11-21 12:16:42,764: DEBUG]: [^--Stream: Topic: topic1]: Started.
[2018-11-21 12:16:43,765: INFO]: [^--Table: topic1]: Starting...
[2018-11-21 12:16:43,766: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2018-11-21 12:16:43,771: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 3: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:43,772: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:44,766: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2018-11-21 12:16:45,165: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 4: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-topic1-changelog', error_code=0, error_message=None)])
[2018-11-21 12:16:45,166: INFO]: [^--Producer]: Topic myapp-topic1-changelog created.
[2018-11-21 12:16:45,166: DEBUG]: [^--Table: topic1]: Started.
[2018-11-21 12:16:45,166: INFO]: [^--Table: topic2]: Starting...
[2018-11-21 12:16:45,166: INFO]: [^--Producer]: Creating topic myapp-topic2-changelog
[2018-11-21 12:16:45,167: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 5: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:45,167: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:45,167: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 6: MetadataResponse_v1(brokers=[(node_id=1, host='localhost', port=9094, rack=None)], controller_id=1, topics=[])
[2018-11-21 12:16:45,168: INFO]: [^--Producer]: Found controller: 1
[2018-11-21 12:16:45,173: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 7: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-__assignor-__leader', error_code=36, error_message="Topic 'myapp-__assignor-__leader' already exists.")])
[2018-11-21 12:16:45,173: DEBUG]: [^--Producer]: Topic myapp-__assignor-__leader exists, skipping creation.
[2018-11-21 12:16:46,494: DEBUG]: <AIOKafkaConnection host=localhost port=9094> Response 8: CreateTopicsResponse_v1(topic_error_codes=[(topic='myapp-topic2-changelog', error_code=0, error_message=None)])
[2018-11-21 12:16:46,494: INFO]: [^--Producer]: Topic myapp-topic2-changelog created.
[2018-11-21 12:16:46,494: DEBUG]: [^--Table: topic2]: Started.
[2018-11-21 12:16:46,495: INFO]: [^---Recovery]: Starting...
[2018-11-21 12:16:46,495: DEBUG]: [^---Recovery]: Started.
[2018-11-21 12:16:46,495: DEBUG]: [^--TableManager]: Started.
[2018-11-21 12:16:46,495: DEBUG]: [^-App]: Started.
[2018-11-21 12:16:46,495: INFO]: [^---Recovery]: WAITING FOR NEXT RECOVERY TO START
[2018-11-21 12:17:26,124: INFO]: [^Worker]: Stopping on signal received...
[2018-11-21 12:17:26,124: INFO]: [^Worker]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^-App]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^-App]: Flush producer buffer...
[2018-11-21 12:17:26,124: INFO]: [^--TableManager]: Stopping...
[2018-11-21 12:17:26,124: INFO]: [^--Fetcher]: Stopping...
[2018-11-21 12:17:26,125: DEBUG]: [^--Fetcher]: Shutting down...
[2018-11-21 12:17:26,125: INFO]: [^--Fetcher]: -Stopped!
[2018-11-21 12:17:26,125: INFO]: [^---Recovery]: Stopping...
[2018-11-21 12:17:26,125: DEBUG]: [^---Recovery]: Shutting down...
[2018-11-21 12:17:26,125: DEBUG]: [^Worker]: Started.
[2018-11-21 12:17:26,126: INFO]: [^---Recovery]: Terminating cancelled task: <coroutine object Recovery._slurp_changelogs at 0x7f12afaac448>
[2018-11-21 12:17:26,126: INFO]: [^Worker]: Gathering service tasks...
[2018-11-21 12:17:26,126: INFO]: [^Worker]: Gathering all futures...
[2018-11-21 12:17:26,127: INFO]: [^---Recovery]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic1]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Table: topic1]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic1]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic2]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Table: topic2]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--Table: topic2]: -Stopped!
[2018-11-21 12:17:26,127: DEBUG]: [^--TableManager]: Shutting down...
[2018-11-21 12:17:26,127: INFO]: [^--TableManager]: -Stopped!
[2018-11-21 12:17:26,127: INFO]: [^--Conductor]: Stopping...
[2018-11-21 12:17:26,127: DEBUG]: [^--Conductor]: Shutting down...
[2018-11-21 12:17:26,128: INFO]: [^--Conductor]: -Stopped!
[2018-11-21 12:17:26,128: INFO]: [^AgentManager]: Stopping...
[2018-11-21 12:17:26,128: INFO]: [^Agent: api.app.myagent]: Stopping...
[2018-11-21 12:17:26,128: INFO]: [^-OneForOneSupervisor]: Stopping...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Stopping...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Shutting down...
[2018-11-21 12:17:26,128: DEBUG]: [^-Agent*: api.app.myagent]: Terminating cancelled task: <Task cancelled coro=<Agent._execute_task() done, defined at /var/lib/laminar/run/ameto/2277/ameto/api/venv/lib/python3.7/site-packages/faust/agents/agent.py:536>>
[2018-11-21 12:17:26,129: DEBUG]: [^-Agent*: api.app.myagent]: -Stopped!
[2018-11-21 12:17:26,129: DEBUG]: [^-OneForOneSupervisor]: Shutting down...
[2018-11-21 12:17:26,129: INFO]: [^-OneForOneSupervisor]: Terminating cancelled task: <coroutine object SupervisorStrategy._supervisor at 0x7f12afa93448>
[2018-11-21 12:17:26,129: INFO]: [^-OneForOneSupervisor]: -Stopped!
[2018-11-21 12:17:26,130: DEBUG]: [^Agent: api.app.myagent]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^Agent: api.app.myagent]: -Stopped!
[2018-11-21 12:17:26,130: DEBUG]: [^AgentManager]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^AgentManager]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--ReplyConsumer]: Stopping...
[2018-11-21 12:17:26,130: DEBUG]: [^--ReplyConsumer]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^--ReplyConsumer]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--LeaderAssignor]: Stopping...
[2018-11-21 12:17:26,130: DEBUG]: [^--LeaderAssignor]: Shutting down...
[2018-11-21 12:17:26,130: INFO]: [^--LeaderAssignor]: -Stopped!
[2018-11-21 12:17:26,130: INFO]: [^--Consumer]: Stopping...
[2018-11-21 12:17:26,131: INFO]: [^--Consumer]: COMMITTING AGAIN AFTER STREAMS DONE
[2018-11-21 12:17:26,131: DEBUG]: Closing the KafkaConsumer.
[2018-11-21 12:17:26,131: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:17:26,131: DEBUG]: The KafkaConsumer has closed.
[2018-11-21 12:17:26,131: DEBUG]: [^--Consumer]: Shutting down...
[2018-11-21 12:17:26,131: INFO]: [^--Consumer]: -Stopped!
[2018-11-21 12:17:26,132: INFO]: [^--Producer]: Stopping...
[2018-11-21 12:17:26,132: DEBUG]: Closing connection at localhost:9094
[2018-11-21 12:17:26,132: DEBUG]: The Kafka producer has closed.
[2018-11-21 12:17:26,132: DEBUG]: [^--Producer]: Shutting down...
[2018-11-21 12:17:26,132: INFO]: [^--Producer]: -Stopped!
[2018-11-21 12:17:26,132: INFO]: [^--Monitor]: Stopping...
[2018-11-21 12:17:26,132: DEBUG]: [^--Monitor]: Shutting down...
[2018-11-21 12:17:26,132: INFO]: [^--Monitor]: -Stopped!
[2018-11-21 12:17:26,133: DEBUG]: [^-App]: Shutting down...
[2018-11-21 12:17:26,133: INFO]: [^-App]: -Stopped!
[2018-11-21 12:17:26,133: DEBUG]: [^Worker]: Shutting down...
[2018-11-21 12:17:26,133: INFO]: [^Worker]: -Stopped!
[2018-11-21 12:17:27,230: INFO]: [^Worker]: Closing event loop

Versions

  • Python version: 3.6.6 and 3.7
  • Faust version: 1.3.2
  • Operating system: Linux
  • Kafka version: 1.0.1
  • RocksDB version (if applicable)

seifertm avatar Nov 21 '18 11:11 seifertm

Can you try setting:

App(stream_wait_empty=False, ...) ?

ask avatar Nov 21 '18 23:11 ask

Adding stream_wait_empty=False does not make a difference.

seifertm avatar Nov 22 '18 10:11 seifertm

However, it does work with Kafka-2.1.0. Could this be an issue of the AdminClient implementation or a Kafka bug?

seifertm avatar Nov 22 '18 10:11 seifertm

Maybe I dismissed the issue too quickly: As soon as I add more topics and changelog topics to the example code above the application starts hanging again during initialization, even with Kafka-2.1.0. Right now, I am testing with four topics and four tables.

seifertm avatar Nov 28 '18 09:11 seifertm

A git bisect states that aa15f93bb03a8631d0e96a95b8b9a07668ae191f is the first bad commit.

Also: When all topics and changelog topics are already present in the Kafka cluster and faust does not has to create them, the application starts up fine. Only when faust has to create the topics, the app initialization will hang.

seifertm avatar Nov 28 '18 14:11 seifertm

Hmm. When I reset the state of my local Kafka it still works, but maybe it depends on version

topic2 is unused by any agent, so not sure how it would work when you uncomment that line? table1+table2 will have an effect on startup, but does it really work if you uncomment one of the tables? (is having two tables necessary to reproduce?)

Here's what happens when the topics do not exist for me:

(def) /opt/devel/faust(master)$> python examples/withdrawals.py worker -l info
┌ƒaµS† v1.4.2────────────────────────────────────────────┐
│ id        │ faust-withdrawals-v3                       │
│ transport │ [URL('aiokafka:')]                         │
│ store     │ rocksdb:                                   │
│ web       │ http://localhost:6066/                     │
│ log       │ -stderr- (info)                            │
│ pid       │ 75061                                      │
│ hostname  │ grainstate.local                           │
│ platform  │ CPython 3.7.0 (Darwin x86_64)              │
│ drivers   │ aiokafka=0.5.1 aiohttp=3.5.2               │
│ datadir   │ /opt/devel/faust/faust-withdrawals-data    │
│ appdir    │ /opt/devel/faust/faust-withdrawals-data/v3 │
└───────────┴────────────────────────────────────────────┘
[2019-01-14 16:47:42,159: INFO]: [^Worker]: Starting...
[2019-01-14 16:47:42,162: INFO]: [^-App]: Starting...
[2019-01-14 16:47:42,162: INFO]: [^--Monitor]: Starting...
[2019-01-14 16:47:42,163: INFO]: [^--Producer]: Starting...
[2019-01-14 16:47:42,176: INFO]: [^CacheBackend]: Starting...
[2019-01-14 16:47:42,176: INFO]: [^Web]: Starting...
[2019-01-14 16:47:42,189: INFO]: [^-Server]: Starting...
[2019-01-14 16:47:42,190: INFO]: [^--Consumer]: Starting...
[2019-01-14 16:47:42,190: INFO]: [^---MethodQueue]: Starting...
[2019-01-14 16:47:42,191: INFO]: [^---AIOKafkaConsumerThread]: Starting...
[2019-01-14 16:47:42,197: INFO]: [^----MethodQueue]: Starting...
[2019-01-14 16:47:43,164: INFO]: [^--LeaderAssignor]: Starting...
[2019-01-14 16:47:43,164: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-__assignor-__leader
[2019-01-14 16:47:43,168: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:43,328: INFO]: [^--Producer]: Topic faust-withdrawals-v3-__assignor-__leader created.
[2019-01-14 16:47:43,329: INFO]: [^--ReplyConsumer]: Starting...
[2019-01-14 16:47:43,329: INFO]: [^AgentManager]: Starting...
[2019-01-14 16:47:43,329: INFO]: [^Agent: __main__.track_user_withdrawal]: Starting...
[2019-01-14 16:47:43,330: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-01-14 16:47:43,331: INFO]: [^Agent: __mai[.]track_country_withdrawal]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^--Conductor]: Starting...
[2019-01-14 16:47:43,332: INFO]: [^--TableManager]: Starting...
[2019-01-14 16:47:43,333: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition
[2019-01-14 16:47:43,334: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:43,390: INFO]: [^--Producer]: Topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition created.
[2019-01-14 16:47:44,334: INFO]: [^--Table: user_to_total]: Starting...
[2019-01-14 16:47:44,427: INFO]: [^---Store: user_to_total]: Starting...
[2019-01-14 16:47:44,428: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-user_to_total-changelog
[2019-01-14 16:47:44,430: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:44,501: INFO]: [^--Producer]: Topic faust-withdrawals-v3-user_to_total-changelog created.
[2019-01-14 16:47:44,502: INFO]: [^--Table: country_to_total]: Starting...
[2019-01-14 16:47:44,502: INFO]: [^---Store: country_to_total]: Starting...
[2019-01-14 16:47:44,503: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-country_to_total-changelog
[2019-01-14 16:47:44,504: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:44,565: INFO]: [^--Producer]: Topic faust-withdrawals-v3-country_to_total-changelog created.
[2019-01-14 16:47:44,566: INFO]: [^---Recovery]: Starting...
[2019-01-14 16:47:45,334: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal
[2019-01-14 16:47:45,337: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,396: INFO]: [^--Producer]: Topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal created.
[2019-01-14 16:47:45,396: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition
[2019-01-14 16:47:45,397: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,406: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-__assignor-__leader
[2019-01-14 16:47:45,407: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,410: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-country_to_total-changelog
[2019-01-14 16:47:45,412: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:45,415: INFO]: [^--Producer]: Creating topic faust-withdrawals-v3-user_to_total-changelog
[2019-01-14 16:47:45,416: INFO]: [^--Producer]: Found controller: 0
[2019-01-14 16:47:46,200: INFO]: Updating subscribed topics to: frozenset({'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', 'faust-withdrawals-v3-user_to_total-changelog', 'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', 'faust-withdrawals-v3-__assignor-__leader', 'faust-withdrawals-v3-country_to_total-changelog', 'withdrawals'})
[2019-01-14 16:47:46,201: INFO]: Subscribed to topic(s): {'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', 'faust-withdrawals-v3-user_to_total-changelog', 'faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', 'faust-withdrawals-v3-__assignor-__leader', 'faust-withdrawals-v3-country_to_total-changelog', 'withdrawals'}
[2019-01-14 16:47:46,229: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,250: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,356: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,362: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,467: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,473: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,580: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError
[2019-01-14 16:47:46,585: WARNING]: Topic 'withdrawals' is not available during auto-create initialization
[2019-01-14 16:47:46,692: INFO]: Discovered coordinator 0 for group faust-withdrawals-v3
[2019-01-14 16:47:46,693: INFO]: Revoking previously assigned partitions set() for group faust-withdrawals-v3
[2019-01-14 16:47:47,201: INFO]: (Re-)joining group faust-withdrawals-v3
[2019-01-14 16:47:47,248: INFO]: Joined group 'faust-withdrawals-v3' (generation 1) with member_id faust-1.4.2-860504a6-a071-4a95-8591-9f3f34e0ff8b
[2019-01-14 16:47:47,248: INFO]: Elected group leader -- performing partition assignments using faust
[2019-01-14 16:47:47,249: WARNING]: Ignoring missing topic: withdrawals
[2019-01-14 16:47:47,294: INFO]: Successfully synced group faust-withdrawals-v3 with generation 1
[2019-01-14 16:47:47,294: ERROR]: Rejoining group -- Need to rejoin! -- Topics not yet created: {'withdrawals'}
[2019-01-14 16:47:47,297: INFO]: (Re-)joining group faust-withdrawals-v3
[2019-01-14 16:47:47,301: INFO]: Joined group 'faust-withdrawals-v3' (generation 2) with member_id faust-1.4.2-860504a6-a071-4a95-8591-9f3f34e0ff8b
[2019-01-14 16:47:47,301: INFO]: Elected group leader -- performing partition assignments using faust
[2019-01-14 16:47:47,304: INFO]: Successfully synced group faust-withdrawals-v3 with generation 2
[2019-01-14 16:47:47,305: INFO]: Setting newly assigned partitions {TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=3), TopicPartition(topic='withdrawals', partition=3), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=2), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=0), TopicPartition(topic='faust-withdrawals-v3-__assignor-__leader', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=1), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=1), TopicPartition(topic='withdrawals', partition=1), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=2), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=3), TopicPartition(topic='withdrawals', partition=2), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=2), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=0), TopicPartition(topic='withdrawals', partition=0), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=1), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=3), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal-faust-withdrawals-v3-Withdrawal.country-repartition', partition=0), TopicPartition(topic='faust-withdrawals-v3-examples.withdrawals.track_country_withdrawal', partition=2), TopicPartition(topic='faust-withdrawals-v3-country_to_total-changelog', partition=1), TopicPartition(topic='faust-withdrawals-v3-user_to_total-changelog', partition=3)} for group faust-withdrawals-v3
[2019-01-14 16:47:52,194: INFO]: [^---Recovery]: Highwater for active changelog partitions:
┌Highwater - Active───────────────────────────────┬───────────┬───────────┐
│ topic                                           │ partition │ highwater │
├─────────────────────────────────────────────────┼───────────┼───────────┤
│ faust-withdrawals-v3-user_to_total-changelog    │ 0         │ -1        │
│ faust-withdrawals-v3-user_to_total-changelog    │ 1         │ -1        │
│ faust-withdrawals-v3-user_to_total-changelog    │ 2         │ -1        │
│ faust-withdrawals-v3-user_to_total-changelog    │ 3         │ -1        │
│ faust-withdrawals-v3-country_to_total-changelog │ 0         │ -1        │
│ faust-withdrawals-v3-country_to_total-changelog │ 1         │ -1        │
│ faust-withdrawals-v3-country_to_total-changelog │ 2         │ -1        │
│ faust-withdrawals-v3-country_to_total-changelog │ 3         │ -1        │
└─────────────────────────────────────────────────┴───────────┴───────────┘
[2019-01-14 16:47:53,510: INFO]: [^---Recovery]: active offsets at start of reading:
┌Reading Starts At - Active───────────────────────┬───────────┬────────┐
│ topic                                           │ partition │ offset │
├─────────────────────────────────────────────────┼───────────┼────────┤
│ faust-withdrawals-v3-country_to_total-changelog │ 2         │ -1     │
│ faust-withdrawals-v3-user_to_total-changelog    │ 2         │ -1     │
│ faust-withdrawals-v3-user_to_total-changelog    │ 1         │ -1     │
│ faust-withdrawals-v3-country_to_total-changelog │ 1         │ -1     │
│ faust-withdrawals-v3-user_to_total-changelog    │ 0         │ -1     │
│ faust-withdrawals-v3-country_to_total-changelog │ 0         │ -1     │
│ faust-withdrawals-v3-country_to_total-changelog │ 3         │ -1     │
│ faust-withdrawals-v3-user_to_total-changelog    │ 3         │ -1     │
└─────────────────────────────────────────────────┴───────────┴────────┘
[2019-01-14 16:47:54,201: INFO]: [^---Recovery]: standby offsets at start of reading:
┌Reading Starts At - Standby─┐
│ topic │ partition │ offset │
└───────┴───────────┴────────┘
[2019-01-14 16:47:54,569: INFO]: [^---Recovery]: Resuming flow...
[2019-01-14 16:47:54,569: INFO]: [^---Recovery]: Recovery complete
[2019-01-14 16:47:54,670: INFO]: [^---Recovery]: Restore complete!
[2019-01-14 16:47:54,671: INFO]: [^---Recovery]: Seek stream partitions to committed offsets.
[2019-01-14 16:47:55,203: INFO]: [^--Fetcher]: Starting...
[2019-01-14 16:47:55,203: INFO]: [^---Recovery]: Worker ready
[2019-01-14 16:47:55,204: INFO]: [^Worker]: Ready```

ask avatar Jan 15 '19 00:01 ask

I finally could come back to this and tested it with faust-1.4.8 and faust-1.4.9. It is still reproducible in both versions.

The second table is necessary to reproduce the issue. With the second table, I am getting the following log output:

$ faust --loglevel info --app app.app worker --without-web
┌ƒaµS† v1.4.9──────────────────────────────────────────┐
│ id        │ myapp                                    │
│ transport │ [URL('kafka://localhost:9092')]          │
│ store     │ memory:                                  │
│ log       │ -stderr- (info)                          │
│ pid       │ 28700                                    │
│ hostname  │ webserver                                │
│ platform  │ CPython 3.6.8 (Linux x86_64)             │
│ drivers   │ aiokafka=0.4.19 aiohttp=3.5.4            │
│ datadir   │ /tmp/faust-test/myapp-data               │
│ appdir    │ /tmp/faust-test/myapp-data/v1            │
└───────────┴──────────────────────────────────────────┘
[2019-03-21 11:32:19,845: INFO]: [^Worker]: Starting...
[2019-03-21 11:32:19,847: INFO]: [^-App]: Starting...
[2019-03-21 11:32:19,847: INFO]: [^--Monitor]: Starting...
[2019-03-21 11:32:19,848: INFO]: [^--Producer]: Starting...
[2019-03-21 11:32:19,870: INFO]: [^--Consumer]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^---MethodQueue]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,871: INFO]: [^---ConsumerThread]: Starting...
[2019-03-21 11:32:19,879: INFO]: [^----MethodQueue]: Starting...
[2019-03-21 11:32:19,880: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 11:32:19,880: INFO]: [^-----MethodQueueWorker]: Starting...
[2019-03-21 11:32:20,850: INFO]: [^--LeaderAssignor]: Starting...
[2019-03-21 11:32:20,852: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 11:32:20,862: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:21,306: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created.
[2019-03-21 11:32:21,306: INFO]: [^--ReplyConsumer]: Starting...
[2019-03-21 11:32:21,306: INFO]: [^--AgentManager]: Starting...
[2019-03-21 11:32:21,307: INFO]: [^Agent: app.myagent]: Starting...
[2019-03-21 11:32:21,311: INFO]: [^-OneForOneSupervisor]: Starting...
[2019-03-21 11:32:21,311: INFO]: [^--Conductor]: Starting...
[2019-03-21 11:32:21,312: INFO]: [^--TableManager]: Starting...
[2019-03-21 11:32:22,314: INFO]: [^--Table: topic1]: Starting...
[2019-03-21 11:32:22,318: INFO]: [^---Store: topic1]: Starting...
[2019-03-21 11:32:22,319: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog
[2019-03-21 11:32:22,325: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:23,405: INFO]: [^--Producer]: Topic myapp-topic1-changelog created.
[2019-03-21 11:32:23,406: INFO]: [^--Table: topic2]: Starting...
[2019-03-21 11:32:23,407: INFO]: [^---Store: topic2]: Starting...
[2019-03-21 11:32:23,408: INFO]: [^--Producer]: Creating topic myapp-topic2-changelog
[2019-03-21 11:32:23,408: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader
[2019-03-21 11:32:23,410: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:23,412: INFO]: [^--Producer]: Found controller: 1
[2019-03-21 11:32:24,475: INFO]: [^--Producer]: Topic myapp-topic2-changelog created.
[2019-03-21 11:32:24,476: INFO]: [^---Recovery]: Starting...

If I comment out topic2_table, the worker starts sucessfully:

$ faust --loglevel info --app app.app worker --without-web
┌ƒaµS† v1.4.9──────────────────────────────────────────┐
│ id        │ myapp                                    │
│ transport │ [URL('kafka://localhost:9092')]          │
│ store     │ memory:                                  │
│ log       │ -stderr- (info)                          │
│ pid       │ 19870                                    │
│ hostname  │ webserver                                │
│ platform  │ CPython 3.6.8 (Linux x86_64)             │
│ drivers   │ aiokafka=0.4.19 aiohttp=3.5.4            │
│ datadir   │ /tmp/faust-test/myapp-data               │
│ appdir    │ /tmp/faust-test/myapp-data/v1            │
└───────────┴──────────────────────────────────────────┘
[2019-03-21 12:03:40,048: INFO]: [^Worker]: Starting... 
[2019-03-21 12:03:40,050: INFO]: [^-App]: Starting... 
[2019-03-21 12:03:40,050: INFO]: [^--Monitor]: Starting... 
[2019-03-21 12:03:40,050: INFO]: [^--Producer]: Starting... 
[2019-03-21 12:03:40,074: INFO]: [^--Consumer]: Starting... 
[2019-03-21 12:03:40,074: INFO]: [^---MethodQueue]: Starting... 
[2019-03-21 12:03:40,074: INFO]: [^----MethodQueueWorker]: Starting... 
[2019-03-21 12:03:40,074: INFO]: [^----MethodQueueWorker]: Starting... 
[2019-03-21 12:03:40,075: INFO]: [^---ConsumerThread]: Starting... 
[2019-03-21 12:03:40,090: INFO]: [^----MethodQueue]: Starting... 
[2019-03-21 12:03:40,090: INFO]: [^-----MethodQueueWorker]: Starting... 
[2019-03-21 12:03:40,091: INFO]: [^-----MethodQueueWorker]: Starting... 
[2019-03-21 12:03:41,052: INFO]: [^--LeaderAssignor]: Starting... 
[2019-03-21 12:03:41,054: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader 
[2019-03-21 12:03:41,061: INFO]: [^--Producer]: Found controller: 1 
[2019-03-21 12:03:41,445: INFO]: [^--Producer]: Topic myapp-__assignor-__leader created. 
[2019-03-21 12:03:41,446: INFO]: [^--ReplyConsumer]: Starting... 
[2019-03-21 12:03:41,446: INFO]: [^--AgentManager]: Starting... 
[2019-03-21 12:03:41,446: INFO]: [^Agent: app.myagent]: Starting... 
[2019-03-21 12:03:41,451: INFO]: [^-OneForOneSupervisor]: Starting... 
[2019-03-21 12:03:41,452: INFO]: [^--Conductor]: Starting... 
[2019-03-21 12:03:41,452: INFO]: [^--TableManager]: Starting... 
[2019-03-21 12:03:42,454: INFO]: [^--Table: topic1]: Starting... 
[2019-03-21 12:03:42,457: INFO]: [^---Store: topic1]: Starting... 
[2019-03-21 12:03:42,458: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog 
[2019-03-21 12:03:42,462: INFO]: [^--Producer]: Found controller: 1 
[2019-03-21 12:03:43,453: INFO]: [^--Producer]: Creating topic myapp-__assignor-__leader 
[2019-03-21 12:03:43,495: INFO]: [^--Producer]: Topic myapp-topic1-changelog created. 
[2019-03-21 12:03:43,496: INFO]: [^---Recovery]: Starting... 
[2019-03-21 12:03:43,498: INFO]: [^--Producer]: Found controller: 1 
[2019-03-21 12:03:43,508: INFO]: [^--Producer]: Creating topic myapp-topic1-changelog 
[2019-03-21 12:03:43,510: INFO]: [^--Producer]: Found controller: 1 
[2019-03-21 12:03:44,096: INFO]: Updating subscribed topics to: frozenset({'myapp-topic1-changelog', 'myapp-__assignor-__leader', 'topic1'}) 
[2019-03-21 12:03:44,101: INFO]: Subscribed to topic(s): {'myapp-topic1-changelog', 'myapp-__assignor-__leader', 'topic1'} 
[2019-03-21 12:03:44,221: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:44,570: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:44,953: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,140: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,247: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,355: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,465: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,575: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,684: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,792: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:45,899: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,006: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,115: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,223: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,333: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,447: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,554: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,659: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,767: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,875: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:46,981: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,086: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,196: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,303: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,409: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,514: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,622: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,729: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,835: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:47,940: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,046: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,157: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,264: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,374: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,484: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,595: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,703: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,809: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:48,914: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,018: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,123: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,228: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,335: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,442: ERROR]: Group Coordinator Request failed: [Error 15] GroupCoordinatorNotAvailableError 
[2019-03-21 12:03:49,549: INFO]: Discovered coordinator 1 for group myapp 
[2019-03-21 12:03:49,551: INFO]: Revoking previously assigned partitions set() for group myapp 
[2019-03-21 12:03:50,102: INFO]: (Re-)joining group myapp 
[2019-03-21 12:03:50,129: INFO]: Joined group 'myapp' (generation 1) with member_id faust-1.4.9-d51e208c-59e3-4962-8221-8854d7c73beb 
[2019-03-21 12:03:50,129: INFO]: Elected group leader -- performing partition assignments using faust 
[2019-03-21 12:03:50,188: INFO]: Successfully synced group myapp with generation 1 
[2019-03-21 12:03:50,189: INFO]: Setting newly assigned partitions 
┌Topic Partition Set────────┬──────────────────────────┐                                                                                           
│ topic                     │ partitions               │                                                                                           
├───────────────────────────┼──────────────────────────┤                                                                                           
│ myapp-topic1-changelog    │ {0, 1, 2, 3, 4, 5, 6, 7} │                                                                                           
│ topic1                    │ {0}                      │                                                                                           
│ myapp-__assignor-__leader │ {0}                      │                                                                                           
└───────────────────────────┴──────────────────────────┘ for group myapp                                                                           
[2019-03-21 12:03:55,058: INFO]: [^---Recovery]: Highwater for active changelog partitions:
┌Highwater - Active──────┬───────────┬───────────┐                                                                                                 
│ topic                  │ partition │ highwater │                                                                                                 
├────────────────────────┼───────────┼───────────┤                                                                                                 
│ myapp-topic1-changelog │ 0         │ -1        │                                                                                                 
│ myapp-topic1-changelog │ 1         │ -1        │                                                                                                 
│ myapp-topic1-changelog │ 2         │ -1        │                                                                                                 
│ myapp-topic1-changelog │ 3         │ -1        │
│ myapp-topic1-changelog │ 4         │ -1        │
│ myapp-topic1-changelog │ 5         │ -1        │
│ myapp-topic1-changelog │ 6         │ -1        │
│ myapp-topic1-changelog │ 7         │ -1        │
└────────────────────────┴───────────┴───────────┘ 
[2019-03-21 12:03:56,456: INFO]: [^---Recovery]: active offsets at start of reading:
┌Reading Starts At - Active──────────┬────────┐
│ topic                  │ partition │ offset │
├────────────────────────┼───────────┼────────┤
│ myapp-topic1-changelog │ 0         │ -1     │
│ myapp-topic1-changelog │ 1         │ -1     │
│ myapp-topic1-changelog │ 2         │ -1     │
│ myapp-topic1-changelog │ 3         │ -1     │
│ myapp-topic1-changelog │ 4         │ -1     │
│ myapp-topic1-changelog │ 5         │ -1     │
│ myapp-topic1-changelog │ 6         │ -1     │
│ myapp-topic1-changelog │ 7         │ -1     │
└────────────────────────┴───────────┴────────┘ 
[2019-03-21 12:03:57,073: INFO]: [^---Recovery]: standby offsets at start of reading:
┌Reading Starts At - Standby─┐
│ topic │ partition │ offset │
└───────┴───────────┴────────┘ 
[2019-03-21 12:03:57,861: INFO]: [^---Recovery]: Resuming flow... 
[2019-03-21 12:03:57,862: INFO]: [^---Recovery]: Recovery complete 
[2019-03-21 12:03:57,963: INFO]: [^---Recovery]: Restore complete! 
[2019-03-21 12:03:57,964: INFO]: [^---Recovery]: Seek stream partitions to committed offsets. 
[2019-03-21 12:03:58,075: INFO]: [^--Fetcher]: Starting... 
[2019-03-21 12:03:58,075: INFO]: [^---Recovery]: Worker ready 
[2019-03-21 12:03:58,077: INFO]: [^Worker]: Ready

The underlying storage implementation for the table does not seem to be involved. I am experiencing the same behaviour as in my original post using the rocksdb store, instead of memory.

I created a gist that contains a file for docker-compose which uses the exact same versions of Zookeeper and Kafka as I am using. https://gist.github.com/seifertm/10cf65a58b08d9301382d5014fdc9121

seifertm avatar Mar 21 '19 11:03 seifertm

also having this issue. I have tried different versions of kafka to no avail. it only happens on the first start up of faust when kafka logs are empty. in debugging though, we clear kafka logs and faust data frequently. which means we have to do a start & restart after every data purge.

tested with versions faust 1.4.x -> 1.5.4

amerski99 avatar Apr 10 '19 23:04 amerski99

Did some more testing on this. Apparently it is not necessary to clear the faust data, just the kafka logs. My suspicion is that it is related to timing (i.e. if creating the initial topics takes too long before getting to the recovery step, it will hang). I was able to recreate this from the example app in the integration tests.

Working with faust version 1.5.4, clearing kafka out, and running this file with worker command causes it to get stuck at recovery step:

from typing import List
import faust


class Point(faust.Record):
    x: int
    y: int


class Arena(faust.Record):
    points: List[Point]
    timestamp: float = None


app = faust.App('t-integration', origin='t.integration.app',
                broker='kafka://host.docker.internal:31101')
add_topic = app.topic('add-topic', internal=True)
add_table1 = app.Table('add-table', partitions=1)
add_table2 = app.Table('add-table2')
add_table3 = app.Table('add-table3')
add_table4 = app.Table('add-table4')
add_table5 = app.Table('add-table5')
add_table6 = app.Table('add-table6')
local_channel = app.channel()


@app.agent()
async def mul(stream):
    """Foo agent help."""
    async for event in stream:
        yield event * 2


@app.agent(add_topic)
async def add(stream):
    async for event in stream:
        yield event + 2


@app.agent(local_channel)
async def internal(stream):
    async for event in stream:
        yield event / 2


@app.command()
async def error_command():
    raise Exception('foo')


if __name__ == '__main__':
    app.main()

amerski99 avatar Apr 15 '19 17:04 amerski99

It seems only to be determined by the number of tables, and not the number of topics. Doing this same thing, with only one table and many topics seems to work. If it doesn't hang with the example I gave above, make sure you clear kafka logs before trying. If it still works, my guess would be because machine running it is faster than the machine I'm running it on. Try adding more example tables.

amerski99 avatar Apr 15 '19 18:04 amerski99

A lot have changed since 1.5.4, so this could be fixed already.

@amerski99 I tried running your example a few times but unable to reproduce. I did not have any data in the topics, so maybe that's why. Also possible this is simply fixed in the latest version OR it's a kafka server version specific issue.

ask avatar May 01 '19 21:05 ask

@ask This issue is still happening in latest code version (v1.7.0a1). Not having data in any of the topics is ok (in fact, it's essential).

This issue only happens the first time you start faust up when kafka is empty. My suspicion is that the first time it starts up, the producer takes longer to create the initial topics, and the order of things gets screwed up. Probably one step is dependent on another step, without a dependency requirement, that usually happens in order, but does not if there are too many topics to create. My evidence for this is, if there are less tables, it does not happen. 6 was the number of tables it took for it to happen on my machine. If my theory is correct, having a faster machine might change the number of tables required. So if it's still not happening for you with 6 tables, try adding more. Again, make sure it is empty kafka server before trying.

kafka version I'm using is the docker image wurstmeister/kafka:2.12-2.2.0. I've tried other versions as well. If you are still not able to recreate, let me know what kafka version you are using and I can try that as well.

amerski99 avatar May 02 '19 20:05 amerski99

Is there any update on this issue? I am using Faust 1.7.1 and see - [2019-10-22 00:55:49,549: ERROR]: Group Coordinator Request failed: [Error 15] CoordinatorNotAvailableError The issue occurs when Kafka was not in good state, But problem is that Faust APP never recovers and I have to manually restart it. I believe there could be ways to handle these errors in my APP and take recovery action. Please do let me know if there are any ideas on how to handle it.

amitkg29 avatar Oct 23 '19 05:10 amitkg29

Further asking - a) Does on_error() callback will be invoked frro above error? b) Shouldn;t faust App recover from the CoordinatorNotAvailableError on its own? If yes, any specific settings required?

Any ways to catch this error in my APP and write a handler?

amitkg29 avatar Oct 29 '19 10:10 amitkg29

@amitkg29 Hi, I also meet the same issue using the Faust app. Did you have any idea to resolve the issue? Thanks.

LiYunJamesPhD avatar Feb 15 '20 18:02 LiYunJamesPhD

I'm unable to run integration tests on my consumers because I always get this error when I start a worker for the first time.

Do you have any news on this ?

max-k avatar Oct 28 '20 08:10 max-k

I also have the same issue, but I did not get it on the first time the worker started, but mine started when I stopped kafka, and then sent the agent some data and then restarted faust app after kafka cluster started.

naveenreddyin avatar Nov 24 '20 21:11 naveenreddyin

Same error for me. It's a bit late but it could help someone else. This is caused by a desync between Kafka and Zookeeper. When you destroy your kafka, there is still topic information in Zookeeper. To solve this, you can connect to your ZK and execute a cleanup: deleteall /brokers. Be careful this will erase all kafka synchronisation config.

elrik75 avatar Jan 15 '22 16:01 elrik75