faust icon indicating copy to clipboard operation
faust copied to clipboard

[^---AIOKafkaConsumerThread]: Thread keepalive is not responding ERROR

Open informatica92 opened this issue 4 years ago • 4 comments

Checklist

  • [v] I have included information about relevant versions
  • [v] I have verified that the issue persists when using the master branch of Faust.

Steps to reproduce

setting up a normal streaming application with rocksdb and an internal timer to delete older rows in rocksdb

Expected behavior

I expected that the scipt continues to work normally

Actual behavior

randomly, in the log I found:

[2020-12-21 15:36:51,857] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:34:04,708] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:37:01,560] [60692] [WARNING] Heartbeat failed: local member_id was not recognized; resetting and re-joining group
[2020-12-21 15:37:03,509] [60692] [WARNING] Timer commit is overlapping (interval=2.8 runtime=178.82933128997684)
[2020-12-21 15:38:05,538] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:38:14,381] [60692] [WARNING] Timer commit is overlapping (interval=2.8 runtime=17.595341868989635)
[2020-12-21 15:38:22,749] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:38:31,374] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:38:41,520] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:38:59,004] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:39:18,977] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:39:38,909] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:39:50,086] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:40:22,655] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:40:41,770] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:40:52,244] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:41:00,917] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:41:19,617] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:41:29,388] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:41:36,996] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:41:48,920] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:42:16,437] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:42:30,207] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:42:38,045] [60692] [WARNING] Timer commit is overlapping (interval=2.8 runtime=117.13050575199304)
[2020-12-21 15:42:47,174] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:43:11,633] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
[2020-12-21 15:43:19,542] [60692] [ERROR] [^---AIOKafkaConsumerThread]: Thread keepalive is not responding...
simulate-rt-odd.service: Main process exited, code=killed, status=9/KILL
simulate-rt-odd.service: Failed with result 'signal'.

so basically the application starts to raise these errors and eventually it is stopped by systemctl (at least I suppose)

Full traceback

for the full traceback see previous step

Versions

  • Python version: Python 3.8.5
  • Faust version: faust-streaming 0.3.1
  • Operating system: Ubuntu 20.04.1 LTS (GNU/Linux 5.4.0-1032-gcp x86_64)
  • Kafka version: 2.4.1
  • RocksDB version (if applicable): 6.15.0 (11/13/2020)

informatica92 avatar Dec 22 '20 09:12 informatica92

Did any one solved this problem or known how it happened?

I've also meet this problem in my project. The application raises these errors and do not restart itself. But it occurs randomly.

Furthermore, I cannot reproduce the problem.. The rocksdb is a key-value stores. How to delete older rows according to [Steps to reproduce]?

paradonite-Y avatar Jan 08 '21 06:01 paradonite-Y

Nope. I am still facing the same problem. I also tried to add some await asyncio.sleep(0) in the deletion procedure but the problem is still there. I am now trying to set the entities I want to delete to None instead of deleting them. I'll let you know if this solves the problem.

informatica92 avatar Jan 11 '21 09:01 informatica92

I probably fond a solution. In order to keep the discussion centralized in a single place, let me just send you the link of the issue where I have already posted the solution: https://github.com/robinhood/faust/issues/695#issuecomment-766655439

informatica92 avatar Jan 25 '21 09:01 informatica92

We've had some fixes for tables and rocksdb in the meantime. I wonder if this is still an issue with the current master branch.

taybin avatar Oct 21 '21 14:10 taybin

I think this has been fixed, I have never seen this bug occur since I've started maintaining this project. There are internal mechanisms in Faust to clean up old RocksDB entries.

wbarnha avatar Jan 13 '23 05:01 wbarnha