pykafka
pykafka copied to clipboard
IllegalGeneration exception from SimpleConsumer
INFO:pykafka.balancedconsumer:Rebalancing consumer "pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6" for topic "my-replicated-topic".
INFO:pykafka.managedbalancedconsumer:Sending JoinGroupRequest for consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'
INFO:pykafka.balancedconsumer:pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6: Balancing 2 participants for 10 partitions. Owning 5 partitions.
DEBUG:pykafka.balancedconsumer:My partitions: ['my-replicated-topic-0-2', 'my-replicated-topic-0-5', 'my-replicated-topic-0-8', 'my-replicated-topic-1-3', 'my-replicated-topic-1-0']
INFO:pykafka.balancedconsumer:pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6: Balancing 2 participants for 10 partitions. Owning 5 partitions.
DEBUG:pykafka.balancedconsumer:My partitions: ['my-replicated-topic-2-7', 'my-replicated-topic-1-6', 'my-replicated-topic-1-9', 'my-replicated-topic-2-4', 'my-replicated-topic-2-1']
INFO:pykafka.managedbalancedconsumer:Sending SyncGroupRequest for consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'
DEBUG:pykafka.simpleconsumer:Committing offsets for 10 partitions to broker id 2
INFO:pykafka.simpleconsumer:Continuing in response to IllegalGeneration
ERROR:pykafka.simpleconsumer:Error committing offsets for topic 'my-replicated-topic' from consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
DEBUG:pykafka.simpleconsumer:Retrying
DEBUG:pykafka.simpleconsumer:Fetcher thread exiting
INFO:pykafka.simpleconsumer:Continuing in response to IllegalGeneration
ERROR:pykafka.simpleconsumer:Error committing offsets for topic 'my-replicated-topic' from consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
DEBUG:pykafka.simpleconsumer:Retrying
INFO:pykafka.simpleconsumer:Continuing in response to IllegalGeneration
ERROR:pykafka.simpleconsumer:Error committing offsets for topic 'my-replicated-topic' from consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
DEBUG:pykafka.simpleconsumer:Retrying
INFO:pykafka.simpleconsumer:Continuing in response to IllegalGeneration
ERROR:pykafka.simpleconsumer:Error committing offsets for topic 'my-replicated-topic' from consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
DEBUG:pykafka.simpleconsumer:Retrying
INFO:pykafka.simpleconsumer:Continuing in response to IllegalGeneration
ERROR:pykafka.simpleconsumer:Error committing offsets for topic 'my-replicated-topic' from consumer id 'pykafka-88ab9f20-80db-493c-9253-998a14cfb0d6'(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]})
INFO:pykafka.cluster:Attempting to discover offset manager for consumer group 'testgroup'
INFO:pykafka.cluster:Found coordinator broker with id 2
DEBUG:pykafka.simpleconsumer:Fetching offsets for 5 partitions from broker id 2
DEBUG:pykafka.simpleconsumer:Set offset for partition 1 to 282963
DEBUG:pykafka.simpleconsumer:Set offset for partition 4 to 283507
DEBUG:pykafka.simpleconsumer:Set offset for partition 9 to 284755
DEBUG:pykafka.simpleconsumer:Set offset for partition 6 to 284410
DEBUG:pykafka.simpleconsumer:Set offset for partition 7 to 282724
I use topic.get_balanced_consumer(consumer_group='testgroup',auto_commit_enable=True,managed=True, consumer_timeout_ms=1500) to get managedbalancedconsumer. this is a self-balancing consumer, but get a error
Thanks @lovemfchen. Does this error happen immediately upon consumer initialization, or during a rebalance that occurs after it's been running for a while?
it happern in a rebalance that I create a consummer with the same group id,I think the new SimpleConsumer have been created,but the old one still commit the offset with old generaton id
We see a number of these upon initialization of a managed balanced consumer
I'm experiencing this same issue
Help is definitely wanted on this issue, since it concerns an API that we're not currently using in production at Parse.ly.
@razor-1 If you're able, it would be very helpful to provide some more information about these errors on initialization. Perhaps you can post some log output that shows the problem, or some example code that can be used to replicate it?
I see things like this:
2018-03-14 23:17:18,814 pykafka.simpleconsumer ERROR Error committing offsets for topic 'b'topic_name'' from consumer id 'b'pykafka-20acb16b-ea5f-44a7-a46d-456273907ea2''(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [1, 2]})
I can try to find the combination of kafka configs and pykafka code that make this reproducible but it might take some time
I think this is related:
Some times, when using managed=True, and if I am in the middle of a rebalance I get
ERROR (pykafka.simpleconsumer): Error committing offsets for topic 'b'segments'' from consumer id 'b'pykafka-4db4a00f-c3e8-405c-b455-5acc0efb1206''(errors: {<class pykafka.exceptions.RebalanceInProgress'>: [0, 1, 2, 3, 4, 5]})
while other times in the same situation I get:
INFO (pykafka.simpleconsumer): Continuing in response to IllegalGeneration
2018-03-29 19:43:33 [24353] ERROR (pykafka.simpleconsumer): Error committing offsets for topic 'b'segments'' from consumer id 'b'pykafka-25ea9350-c0d4-4a8a-862e-457d9ae218b5''(errors: {<class 'pykafka.exceptions.IllegalGeneration'>: [0, 1, 2, 3, 4, 5]})
yet again other times everything works out cleanly.
My consumers always handle the rebalance cleanly if I use managed=False
Remark that my broker is very slow (it runs on a raspberry pi) which perhaps makes it easier to trigger the bug: I see it quite reproducibly.
I trigger the rebalance by shutting down some consumers by calling stop() on them.
@jmoraleda It's really interesting that you find these failures more reliable on a slow processor. That might be key to some improvements we could make to the integration tests to make them shake out these types of timing issues. By the way, that's my hunch about what this might be - synchronization between the various internal threads in a consumer works a bit differently when using managed=True, so I'm not surprised different execution models/patterns exhibit the problem differently.