bireme icon indicating copy to clipboard operation
bireme copied to clipboard

bireme启动后,1分钟左右就出现假死情况,日志没有记录任何错误

Open jinxiaoxin opened this issue 8 years ago • 34 comments

配置了50张表,bireme启动后,1分钟之内就出现假死。无任何操作,无任何报错 有遇到过这种情况的吗?

jinxiaoxin avatar Dec 14 '17 06:12 jinxiaoxin

Please check if there is active load query in database first. And then check loading status with rest api or jmx.

wangzw avatar Dec 14 '17 10:12 wangzw

@wangzw 第一句话什么意思?我没有理解。。我同步5张表没有问题,表一增加 就会出现上述情况

rest api ip:8080/mysql
result: image

rest api ip:8080/ result: image

why?

jinxiaoxin avatar Dec 15 '17 07:12 jinxiaoxin

Which version of bireme are you using?

wangzw avatar Dec 15 '17 09:12 wangzw

@wangzw version: Release v1.0 你有同步过表数量比较多的实例吗?有问题吗?

jinxiaoxin avatar Dec 15 '17 09:12 jinxiaoxin

Have you checked if binlog data has been produced into kafka?

wangzw avatar Dec 15 '17 09:12 wangzw

一直实时监控的,确实生产到kafka了

jinxiaoxin avatar Dec 15 '17 09:12 jinxiaoxin

@wangzw 发现我只要是 在bireme里面配置的表多了,就会出现此问题,,例如 kafka里面只有5张表的数据,而bireme里面配置50张表,就会出问题。是不是需要合理调节下参数?

jinxiaoxin avatar Dec 18 '17 03:12 jinxiaoxin

@jinxiaoxin Thanks for your information.

It would be better to enable TRACE level logging for bireme in log2j.xml.

wangzw avatar Dec 18 '17 08:12 wangzw

@Rucfisher What if the topic we want to subscribe does not exist? Need more investigation.

wangzw avatar Dec 18 '17 08:12 wangzw

我之前都用log4j,对log2j不熟悉,把这里改成trace级别就可以吧? image

jinxiaoxin avatar Dec 18 '17 08:12 jinxiaoxin

@jinxiaoxin 是的,level改成trace就可以。

RebeccaZxy avatar Dec 18 '17 08:12 RebeccaZxy

@RebeccaZxy 改了之后,还是跟以前以前,没有报错信息,假死状态。。

jinxiaoxin avatar Dec 18 '17 09:12 jinxiaoxin

@jinxiaoxin Please also change the root level of log to trace level and try again.

wangzw avatar Dec 18 '17 09:12 wangzw

@jinxiaoxin It is also helpful to print stack of bireme using command line jstack

wangzw avatar Dec 18 '17 09:12 wangzw

@wangzw bireme.out还是没有error,倒是bireme.err里面多了一堆信息,但是也没有error

jinxiaoxin avatar Dec 18 '17 09:12 jinxiaoxin

Any message would help. Please paste to this issue or mail me by email. jstack information would be also useful.

wangzw avatar Dec 18 '17 10:12 wangzw

@wangzw jstack info: jstack -l pid jstack_message.txt

能看出什么问题吗? 我突然又发现如果kafka中未消费数据过多~ 启动bireme 基本会在1分钟左右就假死~

jinxiaoxin avatar Dec 19 '17 06:12 jinxiaoxin


"Provider mysql" #38 prio=5 os_prio=0 tid=0x0000000001e16000 nid=0x8e9 waiting on condition [0x00007fefd4a3e000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000003cfd06170> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
	at cn.hashdata.bireme.provider.KafkaProvider.call(KafkaProvider.java:161)
	at cn.hashdata.bireme.provider.KafkaProvider.call(KafkaProvider.java:35)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x00000003cfc967e8> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"Dispatcher" #37 prio=5 os_prio=0 tid=0x0000000001d0d800 nid=0x8e8 waiting for monitor entry [0x00007fefd4d3e000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at cn.hashdata.bireme.RowCache.createBatch(RowCache.java:91)
	- waiting to lock <0x00000003cfe7c4e0> (a cn.hashdata.bireme.RowCache)
	at cn.hashdata.bireme.RowCache.addRows(RowCache.java:62)
	at cn.hashdata.bireme.Dispatcher.insertRowSet(Dispatcher.java:139)
	at cn.hashdata.bireme.Dispatcher.checkTansformResults(Dispatcher.java:124)
	at cn.hashdata.bireme.Dispatcher.call(Dispatcher.java:73)
	at cn.hashdata.bireme.Dispatcher.call(Dispatcher.java:34)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x00000003cfce25f0> (a java.util.concurrent.ThreadPoolExecutor$Worker)

"TaskGenerator" #36 prio=5 os_prio=0 tid=0x0000000001d0b800 nid=0x8e7 waiting on condition [0x00007fefd4e40000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000003cfe7c768> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078)
	at java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:385)
	at cn.hashdata.bireme.RowCache.createBatch(RowCache.java:116)
	- locked <0x00000003cfe7c4e0> (a cn.hashdata.bireme.RowCache)
	at cn.hashdata.bireme.RowCache.fetchBatch(RowCache.java:130)
	at cn.hashdata.bireme.TaskGenerator.generateMergeTask(TaskGenerator.java:96)
	at cn.hashdata.bireme.TaskGenerator.call(TaskGenerator.java:63)
	at cn.hashdata.bireme.TaskGenerator.call(TaskGenerator.java:32)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
	- <0x00000003cfd54b30> (a java.util.concurrent.ThreadPoolExecutor$Worker)


Seems like a dead lock

wangzw avatar Dec 19 '17 07:12 wangzw

emmmmm,这如何解决了。。。

jinxiaoxin avatar Dec 19 '17 08:12 jinxiaoxin

Remove this line may work. And we will finally fix this issue in 2.0 release.

https://github.com/HashDataInc/bireme/blob/e5e388911fe0dbddb14b896a3ba398ba0d0241f1/src/main/java/cn/hashdata/bireme/RowCache.java#L130

wangzw avatar Dec 19 '17 08:12 wangzw

这段代码移除之后 是一条一条的 copy了吗?没有merge了?

jinxiaoxin avatar Dec 19 '17 10:12 jinxiaoxin

Nope, still in batch.

wangzw avatar Dec 19 '17 10:12 wangzw

1、注释掉了createBatch(); 还是同样的情况,kafka数据堆积过多,就会导致1分钟之内就会假死。 2、为什么我的HTTP服务器一直报500错误 image

jinxiaoxin avatar Dec 20 '17 07:12 jinxiaoxin

We will fix this issue in next release with 500 error together.

wangzw avatar Dec 21 '17 06:12 wangzw

default

shubifeng avatar Jan 25 '18 03:01 shubifeng

@shubifeng 什么意思?

jinxiaoxin avatar Jan 25 '18 03:01 jinxiaoxin

我也遇见同样的问题,同步单个表时正常。但200多个表时,基本不消费了,且没有任何日志。 不知道是不是机器配置太低的问题。

shubifeng avatar Jan 25 '18 03:01 shubifeng

你是maxwell--->kafka--->bireme?

jinxiaoxin avatar Jan 25 '18 03:01 jinxiaoxin

是的,你的问题解决没。我这边启动就卡死,没有任何错误信息,机器只有几十MB可用估计跟这个有关。你扣扣多少,加你一起交流下。

shubifeng avatar Jan 25 '18 03:01 shubifeng

@jinxiaoxin 兄弟 都搞定了吗

shubifeng avatar Mar 16 '18 03:03 shubifeng