flink-cdc icon indicating copy to clipboard operation
flink-cdc copied to clipboard

Flink Oracle CDC ORA-27102: out of memory

Open cheng66551 opened this issue 3 years ago • 9 comments

Using Flink Oracle CDC version 2.1.0, an Oracle memory overflow problem occurred after the program had been running for some time. Details are as follows :

Caused by: java.sql.SQLRecoverableException: ORA-00603: ORACLE server session terminated by fatal error
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 655360
ORA-04030: out of process memory when trying to allocate 8392728 bytes (krvxdups: priv,redo read buffer)

	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:509)
	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:461)
	at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1104)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:550)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:268)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:655)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:270)
	at oracle.jdbc.driver.T4CPreparedStatement.doOall8(T4CPreparedStatement.java:91)
	at oracle.jdbc.driver.T4CPreparedStatement.executeForRows(T4CPreparedStatement.java:970)
	at oracle.jdbc.driver.OracleStatement.executeMaybeDescribe(OracleStatement.java:1012)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1168)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3666)
	at oracle.jdbc.driver.T4CPreparedStatement.executeInternal(T4CPreparedStatement.java:1426)
	at oracle.jdbc.driver.OraclePreparedStatement.executeQuery(OraclePreparedStatement.java:3713)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.executeQuery(OraclePreparedStatementWrapper.java:1167)
	at io.debezium.connector.oracle.logminer.LogMinerStreamingChangeEventSource.execute(LogMinerStreamingChangeEventSource.java:184)
	... 7 more
Caused by: Error : 603, Position : 0, Sql = SELECT SCN, SQL_REDO, OPERATION_CODE, TIMESTAMP, XID, CSF, TABLE_NAME, SEG_OWNER, OPERATION, USERNAME, ROW_ID, ROLLBACK FROM V$LOGMNR_CONTENTS WHERE SCN > :1  AND SCN <= :2  AND ((OPERATION_CODE IN (5,34) AND USERNAME NOT IN ('SYS','SYSTEM','SYSTEM')) OR (OPERATION_CODE IN (7,36)) OR (OPERATION_CODE IN (1,2,3) AND TABLE_NAME != 'LOG_MINING_FLUSH' AND SEG_OWNER NOT IN ('APPQOSSYS','AUDSYS','CTXSYS','DVSYS','DBSFWUSER','DBSNMP','GSMADMIN_INTERNAL','LBACSYS','MDSYS','OJVMSYS','OLAPSYS','ORDDATA','ORDSYS','OUTLN','SYS','SYSTEM','WMSYS','XDB') AND (REGEXP_LIKE(SEG_OWNER,'^SCOTT$','i')) AND (REGEXP_LIKE(SEG_OWNER || '.' || TABLE_NAME,'^SCOTT.aaa$','i')) )), OriginalSql = SELECT SCN, SQL_REDO, OPERATION_CODE, TIMESTAMP, XID, CSF, TABLE_NAME, SEG_OWNER, OPERATION, USERNAME, ROW_ID, ROLLBACK FROM V$LOGMNR_CONTENTS WHERE SCN > ? AND SCN <= ? AND ((OPERATION_CODE IN (5,34) AND USERNAME NOT IN ('SYS','SYSTEM','SYSTEM')) OR (OPERATION_CODE IN (7,36)) OR (OPERATION_CODE IN (1,2,3) AND TABLE_NAME != 'LOG_MINING_FLUSH' AND SEG_OWNER NOT IN ('APPQOSSYS','AUDSYS','CTXSYS','DVSYS','DBSFWUSER','DBSNMP','GSMADMIN_INTERNAL','LBACSYS','MDSYS','OJVMSYS','OLAPSYS','ORDDATA','ORDSYS','OUTLN','SYS','SYSTEM','WMSYS','XDB') AND (REGEXP_LIKE(SEG_OWNER,'^SCOTT$','i')) AND (REGEXP_LIKE(SEG_OWNER || '.' || TABLE_NAME,'^SCOTT.aaa$','i')) )), Error Msg = ORA-00603: ORACLE server session terminated by fatal error
ORA-27102: out of memory
Linux-x86_64 Error: 12: Cannot allocate memory
Additional information: 108
Additional information: 655360
ORA-04030: out of process memory when trying to allocate 8392728 bytes (krvxdups: priv,redo read buffer)

	at oracle.jdbc.driver.T4CTTIoer11.processError(T4CTTIoer11.java:513)
	... 22 more

I found that LOGMNR is turned on for every loop except for log archiving, and that it takes up PGA memory every time LOGMNR is turned on.
This problem eventually occurs if the logs remain unarchived because the table data has not changed, and PGA memory continues to grow.

LogMinerStreamingChangeEventSource class part of the code is as follows

if (hasLogSwitchOccurred()) {
	// This is the way to mitigate PGA leaks.
	// With one mining session, it grows and maybe there is another way to flush PGA.
	// At this point we use a new mining session
	LOGGER.trace("Ending log mining startScn={}, endScn={}, offsetContext.getScn={}, strategy={}, continuous={}",
			startScn, endScn, offsetContext.getScn(), strategy, isContinuousMining);
	endMining(jdbcConnection);

	initializeRedoLogsForMining(jdbcConnection, true, archiveLogRetention);

	abandonOldTransactionsIfExist(jdbcConnection, transactionalBuffer);

	// This needs to be re-calculated because building the data dictionary will force the
	// current redo log sequence to be advanced due to a complete log switch of all logs.
	currentRedoLogSequences = getCurrentRedoLogSequences();
}

startLogMining(jdbcConnection, startScn, endScn, strategy, isContinuousMining, streamingMetrics);

cheng66551 avatar Jan 19 '22 07:01 cheng66551

是的,我也有同样问题,我用的版本是CDC 2.1.1。 这是一个严重的问题。对于我的oracle server(4G ram),提交一个表的cdc sql几乎不能超过3小时就宕机了。 如果你使用如下: alter system switch logfile;

马上ORACLE PGA就下降了,但是生产上不可行。 想到flink cdc2.1.1是集成了debezium1.5版本,版本比较低,不知道目前最新debezium1.8是否改善了? 这是一个在我公司不能采用FLINK CDC的原因。

jtsunxp avatar Feb 18 '22 07:02 jtsunxp

是的,我也有同样问题,我用的版本是CDC 2.1.0,怎么解决的

lidong2021 avatar Feb 24 '22 03:02 lidong2021

可以配置debezium的参数,解决掉oracle连接进程一一直增长的问题,debezium.log.mining.transaction.retention.hours=1解决问题

ffyygg avatar Mar 12 '22 12:03 ffyygg

可以配置debezium的参数,解决掉oracle连接进程一一直增长的问题,debezium.log.mining.transaction.retention.hours=1解决问题

这么配置的话,会不会丢掉长事务的数据?

zhujian86 avatar Mar 18 '22 02:03 zhujian86

可以配置debezium的参数,解决掉oracle连接进程一一直增长的问题,debezium.log.mining.transaction.retention.hours=1解决问题

配置了,但是也没有解决问题~~

zhujian86 avatar Mar 21 '22 01:03 zhujian86

官方给出了解决方案,看看是不是可以用。Oracle强制切换,也是简单粗暴~

ALTER SYSTEM SET archive_lag_target=1200 scope=both;

https://issues.redhat.com/browse/DBZ-4963

https://debezium.io/documentation/reference/1.9/connectors/oracle.html#oracle-pga-aggregate-limit

zhujian86 avatar Apr 12 '22 01:04 zhujian86

flink version: 1.14.6 oracle 11g. 同样遇到该问题.
flink cdc version: 2.3.0 flink cdc 对应的debezium.version 是 1.6.4.Final , 太低啦 有没有 详细的 针对debezium 升级的 步骤(调整源码)

liuyitian0 avatar Jun 02 '23 03:06 liuyitian0

@zhujian86 我也遇到类似问题,你解决了吗?

wangzhen-wudangshan avatar Jun 05 '23 05:06 wangzhen-wudangshan

Closing this issue because it was created before version 2.3.0 (2022-11-10). Please try the latest version of Flink CDC to see if the issue has been resolved. If the issue is still valid, kindly report it on Apache Jira under project Flink with component tag Flink CDC. Thank you! cc @GOODBOY008

gong avatar Feb 03 '24 06:02 gong

Closing this issue because it was created before version 2.3.0 (2022-11-10). Please try the latest version of Flink CDC to see if the issue has been resolved. If the issue is still valid, kindly report it on Apache Jira under project Flink with component tag Flink CDC. Thank you!

PatrickRen avatar Feb 28 '24 15:02 PatrickRen